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Artificial chromosomes comprising concatemers for expressible nucleotide 
sequences. 

This application is a nonprovisional of U.S. provisional application Serial No. 
5 60/300,865 filed 27 June 2001, which is hereby incorporated by reference in its 
entirety. The application claims priority from Danish patent application number PA 
2001 00130 filed 25 January 2001, which is hereby incorporated by reference in its 
entirety. All patent and nonpatent references cited in the application, or in the 
present application, are also hereby incorporated by reference in their entirety. 

10 

Field of the invention 

In the present invention is disclosed the use of artificial chromosomes for the co- 
ordinated and controllable expression of laige numbers of heterologous genes in a 
15 single host cell. In particular, the invention relates to an artificial chromosome 
comprising at least two co-ordinatedly expressible nucleotide sequences, an artifidal 
chromosome comprising at least two expression cassettes and a host cell 
comprising at least one of these artificial chromosomes as well as to a host cell 
comprising at least three different artificial chromosomes. 

20 

Prior.art 

An artificial chromosome is a vector based on functional entities derived from a 
natural chromosome that can replicate and be stably maintained In a cell. 

25 

Artificial chromosomes are man-made linear or circular DNA molecules constructed 
from essential cis-acting DNA sequence elements that are responsible for the 
proper replication and partitioning of natural chromosomes (see Murray et al. Nature 
301:189-193 (1983)). These essential elements are: (1) Autonomous Replication 

30 Sequences (ARS) (have properties of replication origins, which are the sites for 
initiation of DNA replication). (2) Centromeres (site of kinetochore assemble and 
responsible for proper distribution of replicated chromosomes at meiosis and 
mitosis), and (3) Telomeres (specialised structures at the ends of linear 
chromosomes that function to stabilise the ends and facilitate the complete 

35 replication of the extreme termini of the DNA molecule). 
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Artificial chromosomes have been constructed In yeast using the three cloned 
essentia! chromosomal elements. Murray et al.. Nature 305:189-193 (1983), 
disclose a cloning system based on the in vitro construction of linear DMA molecules 
that can be transformed into yeast, where they are maintained as artificial 
chromosomes. These artificial yeast chromosomes contain cloned genes, 
replicators, centromeres and telomeres but have impaired centromeric function in 
short (less than 20 kb) artificial chromosomes. Another Yeast artificial chromosome, 
called a functional minlchromosome is disclosed in US 4,464,472 (Carbon et al). 

Artificial chromosomes have been constructed for a number of species and methods 
have been developed to generalise the design of artificial chromosomes for other 
species. 

15 US 5,270,201 (Richards et al) describe an artificial chromosome vector which is 
especially adapted for insertion into plant cells such as Arabidopsis thaliana. 

Hamilton et al (US 5.977,439) have developed a scM^lled BIBAC vector for 
Agrobacterium based transfomiation of plant cells. The BIBAC vector is based on a 
20 Bacterial Artificial Chromosome (BAG) and a binary vector (BIN). The BIBAC vector 
allows constmction of plant genomic libraries with large DNA inserts that can be 
introduced, into plants by transfomiation mediated by Agrobacterium. 

Artificial chromosomes based on Baculovirus may be used as artificial 
25 chromosomes in insects such as Lepidoptera including butterflies and moths (US 
6,090.584 (latrou et al)). 

Recently, methods for preparation of mammalian artificial chromosomes have also 
been developed (US 6,133,503 (Scheffler) and US 6.077.697 (Hadlaczky et al)) and 
30 it must be envisaged that it becomes possible to design suitable artificial 
chromosomes for any desired species. 

Artificial chromosomes can be regarded as giant vectors adapted to stably maintain 
in the host cell, large nucleotide sequences. Artificial chromosomes have been used 
35 as libraries of nucleotide sequences, for gene therapy, especially gene therapy 
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involving the simultaneous expression of an entire metabolic pathway. Apart from 
this, artificial chromosomes may be used as infomiation storage vehicles, for 
analysis and study of centromere function. Known artificial chromosomes include 
chrornosomes comprising up to 1000 megabases. 

5 

Another application (WO 99/67374) of artificial chromosomes is an application, 
whereby one transfers the ability to produce a secondary metabolite from an 
actinomycete that is the original producer of the natural product, to a different 
production host that has desirable characteristics. The application involves the 
10 construction of a segment of the chromosome of the original producer in an artificial 
chromosome that can be stably maintained in a suitable production host. 

Artificial chromosomes have not been used for the co-ordinated and controlled 
expression of a number of different genes and artificial chromosomes have not been 
15 used in the evolution of novel biochemical pathways. 

Summary of the invention 

In a first aspect the invention relates to an artifidal chromosome comprising at least 
20 one nucleotide concatemer, the concatemer comprising in the 5'-»3* direction a 
cassette of nucleotide sequence of the general formula . 
[rs2-SP-PR-X-TR-SP-rSi]n 
wherein 

rsi and rS2 together denote a restriction site, 
25 SP denotes a spacer of at least two nucleotide bases, 

PR denotes a promoter, capable of functioning In a cell, 
X denotes an expressible nucleotide sequence, 
TR denotes a terminator, and 

SP denotes a spacer of at least two nucleotide bases, and 
30 n>2. 

Due to the highly ordered structure of the concatemer the assembly of the 
concatemer is easily performed, especially when the restriction site comprises sticky 
ends having a pre-determined nucleotide sequence. The expressible nucleotide 
35 sequences may conveniently arise from a cDNA library obtained from one or more 
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expression states, wherein the cDNA clones have been inserted into expression 
cassettes. Following excision of the expression cassettes from the vector comprising 
the construct in the cpNA library, the multitude of constructs may be concatenated 
and inserted into an "empty" artificial chromosome for subsequent transformation 
5 into a host cell. 

The artificial chromosome according to the invention may comprise a selection of 
expressible nucleotide sequences from just one expression state and can thus be 
assembled from one library representing this expression state or it may comprise 

10 cassettes from a number of different expression states. The variation among and 
between cassettes in the arBficial chromosome may be such as to minimise the 
chance of cross over as the host cell undergoes cell division such as through 
minimising the level of repeat sequences occuning in any one concatemer, since it 
is not an object of this embodiment of the invention to obtain inter- or 

15 intrachromosomal recombination of the artificial chromosomes. Nor is it an object to 
obtain recombination with the host genome or an episome of the host cells. 

One advantage of the sfructure of the concatemer is that it can be recovered from 
the host cell and by subsequent digestion with a restriction enzyme specific for the 
20 rsi-rs2 restriction site. The building blocks of the concatemers may thus be 
disassembled and reassembled at any point 

The cassettes of the concatemer may be joined head to tail or head to head or tail to 
tail, which does not affect expression of the expressible nucleotide sequences ^ 
25 because each expressible nucleotide sequence is under the confrol of it's own 
promoter. This is due to the fact that most restriction enzymes leave two identical 
overhangs, which may combine in either order at the same frequency. 

In a second aspect the invention relates to an artificial chnamosome comprising at 
.30 least a first and a second expressible nucleotide sequence under the control of a 
confrollable promoter, the promoter of Vne first expressible nucleotide sequence 
being controllable independentiy from tiie promoter of the other expressible 
nucleotide sequence. 
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By having two or more expressible nucleotide sequences located on the same 
artificial chromosome under the control of different promoters, the expression state 
of a cell comprising the artificial chromosome can be manipulated in a co-ordinated 
way through regulation of the two or more different promoters. The artificial 
5 chromosomes are especially useful in the evolution of novel biochemical pathways, 
where genes from multiple expression states (e.g. from multiple species) are 
combined in one host cell. The single genes may be inserted under the control of 
different promoters. Preferably one artificial chromosome comprises a . unique 
combination of promoters and genes. By having several artificial chromosomes 

10 inserted into a number of cells, in principle any combination of sub-sets of genes 
- may be tumed on or off in a population of cells by having random combinations of 
genes and promoters represented. Furthermore, by up and down regulation of 
specific promoters, different sub-sets of genes may be turned on and off in a co- 
ordinated way and numerous combinations of expressed genes may be obtained in 

15 just one cell. Furthermore, In biochemical pathway evolution, chances are great that 
lethal genes are inserted Into the host cell. Through down regulation of different 
promoters, those controlling the lethal genes may be switched off allowing evolution 
of biocherriical pathways from the remaining non-lethal genes. 

20 In a further, aspect the invention relates to a host cell comprising at least one , 
artificial chromosome comprising at least a first and a second expressible nucleotide 
sequence under the control of a controllable promoter^ the -promoter of the first 
expressible nucleotide sequence being controllable independentiy from the promoter 
of the other expressible nucleotide sequence. 

25 

Such host cells are ideal candidates for the evolution of novel biochemical pathways 
leading possibly to novel metabolites, such as drug candidates. The expression 
state of the transgenic cell may be changed in a co-ordinated way through up or 
down regulation of one or more controllable promoters. As explained above identical 
30 promoters preferably regulates a subset of expressible nucleotide sequences 
allowing the co-ordinated expression of sub-sets of genes. In a population of cells 
according to the invention, multiple combinations of genes may be co-ordinatedly 
expressed in this way. 
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in another aspect the invention relates to a host cell comprising at least two artificial 
chromosomes containing a concatemer each. By having at least two arOficial 
chromosomes in one cell, evolution can be performed using techniques such as 
traditional breeding. 

5 

In a still further aspect the invention relates to a host cell comprising at least three 
artificial chromosomes, wherein the three chromosomes are different. More . 
preferably the invention relates to a host cell comprising at least four artificial 
chromosomes, wherein the four chromosomes are different 

10 

By having at least three different artificial chromosomes in one cell, a very high 
number of foreign genes can be inserted and maintained in the host cell. The host 
cell may either be used as a library cell for information storage purposes or the 
artificial chromosomes may comprise expressible gene sequences for gene therapy, 
15 for production of proteins for production of compounds requiring the expression of a 
high number of genes and/or for evolution of novel biochemical pathways. 

Definitions 

20 Unless defined otherwise, all technical and scientific terms used herein have the 
same meaning as is commonly understood by one of skill in the art to which this 
invention belongs. All patents and publications referred to herein are incorporated by. 
reference. 

25 As used herein, a mammalian artificial chromosome [MAC] is a piece of DNA that 
can stably replicate and segregate alongside endogenous chromosomes. It has the 
capacity to accommodate and express heterologous genes inserted therein. It is 
referred to as a mammalian artifidal chromosome because it includes an active 
mammalian centromere. Plant artificial chromosomes and an insect artificial 

30 chromosomes refer to chromosomes that include plant and insect centromeres, 
respectively; A human artificial chromosome [HAC] refers to chromosomes that 
. include human centromeres, BUGACs refer to artificial insect chromosomes, and 
AVACs refer to avian artificial chromosomes. A yeast artificial chromosome (YAC) 
refers to chromosomes that includes centromere being functional in yeast, such as a 

35 yeast centromere. 
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As used herein, stable maintenance of chromosomes, occurs when at least about 
85%, preferably 90%, more preferably 95%, of the cells retain the chromosome. 
Stability is measured in the presence of selective agent. Preferably these 
5 chromosomes are also , maintained in the absence of a selective agent. Stable 
chromosomes also retain their structure during cell culturing, suffering neither 
intrachromosomal nor interchromosomal rearrangements. 

As used herein, growth under selective conditions, means growth of a cell under 
1 0 conditions that require expression of a selectable marker for survival. 

By a controllable promoter is meant a promoter, which can be controlled through 
external manipulations such as addition or removal of a compound from the 
surroundings of the cell, change of physical conditions, etc. 

15 . 

Co-ordinated expression refers to the expression of a sub-set of genes which, are 
induced or repressed by the same extemal stimulus or stimuli. 

Restriction site 

20 For the purposes of the present invention the abbreviation RSn (n=1,2,3, etc) is 
used to designate a nucleotide sequence comprising a restriction site. A restriction 
site is defined by a recognition sequence and a cleavage site. The^cleavage site 
may be located within or outside the recognition sequence. The abbreviation "rsi" or 
"rs2" is used to designate the two ends of a restriction site after cleavage. The 

25 sequence "rsi-rsa" together designate a complete restriction site. 

The cleavage site of a restriction site may leave a double stranded polynucleotide 
sequence with either blunt or sticky ends. Thus, "rsr or "rsa" may designate either a 
blunt or a sticky end. 
30 . 

In the notation used throughout the present invention, formulae like: 
RS1 -RS2-SP-PR-X-TR-SP-RS2-RS 1 

should be interpreted to mean that the individual sequences follow in the order 
specified. This does not exclude that part of the recognition sequence of e.g. RS2 
35 dveriap with the spacer sequence, but it is a strict requirement that all the items 
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except RS1 and RSr are functional and remain functional after cleavage and re- 
assemblage. Furthermore the fomiulae do not exclude the possibility of having 
additional sequences Inserted between the listed items. For example introns can be 
inserted as described in the invention below and further spacer sequences can be 
5 inserted between RS1 and RS2 and between TR and RS2. Important is that the 
sequences remain functional. 

Furthermore, when reference is made to the size of the restriction site and/or to 
specific bases within it, only the bases in the recognition site are referred to. 

10 

Expression state 

An expression state is a state in any specific tissue of any individual organism at any 
one time. Any change in conditions leading to changes in gene expression leads to 
another expression state. Different expression states are found in different 

15 individuals, in different species but they may also be found in different organs in the 
same species or individual, and in different tissue types in the same species or 
individual. Different expression states may also be obtained in the same organ or 
tissue in any one species or individual by exposing the tissues or organs to different 
environmental conditions comprising but not limited to changes in age, disease, 

20 infection, drought, humidity, salinity, exposure to xenobiotics, physiological effectors, 
temperature, pressure, pH, light, gaseous environment, chemicals such as toxins. 

Brief description of the drawings 

25 Fig. 1 shows a flow chart of the steps leading from an expression state to 
incorporation of the expressible nucleotide sequences In an entry library (a 
nucleotide library according to the invention). 

Fig. 2 shows a flow chart of ttie steps leading from an entry library comprising 
30 expressible nucleotide sequences to evolvable artificial chromosomes (EX/AC) 
transformed into an appropriate host cell. Fig. 2a shows one way of producing the 
EVACs which includes concatenation, size selection and insertion into an artificial 
chromosome vector. Fig. 2b shows a one step procedure for concatenation and 
ligation of vector arms to obtain EVACs. 



35 
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Fig. 3 shows a model entry vector. MCS is a multi cloning site for inserting 
expressible nucleotide sequences. Amp R is the gene for ampicillin resistance. Col 
E is the origin of replication in E. coli. R1 and R2 are restriction enzyme recognition 
sites. 

5 

Fig. 4 shows an example of an entry vector according to the invention, EVE4. 
MET25 is a promoter, ADH1 is a terminator, f1 is an origin of replication for 
filamentous phages, e.g. M13. Spacer 1 and spacer 2 are constituted by a few 
nucleotides deriving from the multiple cloning site, MCS, Srfl and AscI are restriction 
10 enzyme recognition sites. Other abbreviations, see Fig. 3. The sequence of the 
vector is set forth in SEQ ID NO 1 . 

Fig 5 shows an example of an entry vector according to the invention, EVES. CUP1 
is a promoter, ADH1 is a terminator, f1 is an origin of replication for filamentous 
15 phages, e.g. Ml 3. Spacer 1 and spacer 2 are constituted by a few nucleotides > 
deriving from the multiple cloning site, MCS, Srfl and AscI are restriction enzyme 
recognition sites. Other abbreviations, see Fig. 3. The sequence of the vector is set ^ 
forth in SEQ ID NO 2. 

20 Fig 6 shows an example of an entry vector according to the invention, EVES. C.UP1 

is a promoter, ADH1 is a terminator, f1 is an origin of replication for filamentous ^ 
phages, e.g. M13. Spacers is a 550 bp fragment of lambda phage ONA. Spacer4 is 
a ARS1 sequence from yeast. Srfl and AscI are restriction enzyme recognition sites. 
Other abbreviations, see Fig. 3. The sequence of the vector is set forth in SEQ ID 

25 NO 3. 

Fig. 7 shows a vector (pYAC4-Ascl) for providing arms for an evolvabie artificial 
chromosome (EVAC) into which a concatemer according to the invention can be 
cloned. TRP1 , URA3, and HIS3 are yeast auxotrophic marker genes, and AmpR is 
30 an E. coli antibiotic marker gene. CEN4 is a centromere and TEL are telomeres. 
ARS1 and PMB1 allow replication in yeast and E. coll respectively. BamH I and Asc 
I are restriction enzyme recognition sites. The nucleotide sequence of the vector is 
set forth in SEQ ID NO 4. 
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Fig 8. shows the general concatenation strategy. On the left is shown a circular 
entry vector with restriction sites, spacers, promoter, expressible nucleotide 
sequence and terminator. These are excised and ligated randomly. 
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20/1 
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2/1 
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1/1 
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1/2 
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1/5 



Legend: Lane M: molecular weight marker. J^^phage DNA digested w. Pst1. Lanes 
1-9. concatenation reactions. Ratio of fragments to yac-arms(F/Y) as in table. 

Fig 9a and 9b. illustrates the integration of concatenation with synthesis of evolvable 
1 0 artificial chromosomes and how ooncatemer size can be controlled by controlling the 
ratio of vector amis to expression cassettes, as described in example 7. 

Fig 10. Library of EVAC transformed population shown under 4 different growth 
conditions. ColoutBd phenotypes can be readily detected upon induction of the 
15 Met25 and/or the Capl promoter. 



Fig 1 1 . EVAC gel Legend: PFGE of EVAC containing clones : 
Lanes, a: Yeast DNA PFGE markers(strain YNN295), b: lambda ladder, c: non- 
transformed host yeast, 1 - 9 : EVAC containing clones. EVACs in size range 1400- 
20 1600 kb. Lane 2 shows a clone containing 2 EVACs sized -1500 kb and ~^550 kb 
respectively. The 550kb EVAC is comigrating with the 564kb yeast chromosome 
and is resulting in an Increased intensity of the band at 564 kb relative to the other 
bands in the lane. Arrows point up to EVAC bands. 

25 Detailed description 

In describing the artificial chromosomes of this invention, the individual components 
will first be considered: Namely the functional element of which the artificial 
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chromosome is composed; and other genes which contribute properties to 
transformed cells. 

Centromere 

5 • 

The centromere is the junction between the two arms of a chromosome to which the 
spindle fibers attach, either directly or indirectly, during mitosis and meiosis. Thus, 
the centromere acts to orient the chromosome during cell splitting, so that the two 
copies of the chromosome are directed to opposite poles of the cell prior to splitting 
10 into two progeny. The centromere also acts as a binding site for binding the 
chromosome to the spindle, thus ensuring that each daughter cell receives a copy of 
the chromosome. 

Each of the chromosomes of a eukaryote may have a centromere of different 
15 composition. For the most part, the centromeres will be relatively small, usually 
smaller than about 2kbp, usually less than about 1.6kbp and may function with as 
few as 0.2kbp, more usually as few as O.Skbp. For the most part, the centromere 
segment does not have long repetitive segments as observed with heterochromatin. 

20 The centromere may be obtained from any eukaryotic host. Eukaryotic hosts include 
plants, insects, molds, fungi, mammals and the like. Of particular interest are plants, 
particularly food crops, fruit frees, and wood.frees; fungi, such as mushrooms, yeast; 
mammals, such as domestic animals and humans; and birds, such as domestic 
poultry. 

25 

There are a number of different ways to obtain centromeres. Initially, the centromere 
will normally be obtained fronri a host chromosome. Desirably, the host chromosome 
has been mapped so as to establish an area which functions as the centromere and 
is bordered by restriction sites. The area defined as the centromere frequently can 

30 be detected by the substantial absence of recombination events in the vicinity of the 
centromere. By appropriate mapping, one can define structural genes on opposite 
sides of the centromere and resfriction sites which allow for cleavage of the 
chromosome to produce a segment including at least one structural gene and 
preferably both structural genes. The structural genes serve as markers, since the 

35 expression of the structural genes in a clone requires the presence of the 
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centromere. 



The fragments will generally be less than ten percent In number of base pairs of the 
chromosome from which the centromere containing fragment was derived. 

5 Fragments may then be fonned by restriction enzyme cleavage. The fragments may 
be inserted into a shuttle vector containing a prokaryotic replication site and a 
eukaryotic chromosomal replicator. By transforming a prokaryote auxotrophic 
mutant which is complemented by at least one of the structural genes adjacent the 
centromere one can select for clones having a high probability of having the 

10 centromere DNA sequence. Selective medium will pennit selection of the 
transformed clones. 

The eukaryotic fragments inserted into the shuttle vector are then excised at the 
restriction sites; the resulting mixture of eukaryotic segments will have a greatly 

15 enhanced concentration of centromere containing segments. The mixture of DNA 
fragments may now be inserted in the same shuttle vector or a different vector 
having a replicating site for the host to be transfomied. which may or may not be the 
same host from which the centromere was. obtained. Desirably, the host should be 
an auxotroph for one of the structural genes associated with the centromere to allow 

20 for rapid selection of host transfomned with the hybrid DNA containing the stmctural 
gene. By cultivating the host through a number of generations, transfonned cells 
having, plasmld. lacking the ^centromere will be unstable and reject the plasmid. 
Those cells which retain the mariners and are prototrophic in the mariner will have 
plasmids containing the centromere. Therefore, it is not necessary to employ an 

25 auxotrophic mutant, it will be sufficient to employ a phenotypic marker, particulariy 
one allowing for selection. 

The plasmids are isolated from the cells and by employing overiap hydridization. the 
DNA sequence providing the centromere function is identified. The centromere may 
30 then be isolated substantially free of the genes immediately adjacent the centromere 
In the chromosome from which the centromere was derived. In this way, one can 
have a DNA segment which provides the centromere function and can be bonded to 
a wide variety of structural genes, operators, binding sites, regulating genes, or the 
like, in addition to the one or more replicating sites. 
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Once the centromere segment has been isolated, the segment may be sequenced 
and synthesized. 

5 Replication Site 

In order to have stable mitotic maintenance, a replication site in combination with the 
centromere segment is necessary. The replication site is the DNA sequence which 
Is recognised by the enzymes and proteins involved in replication of the DNA 

10 duplex. The replication site can be initially obtained by genomic cloning. The 
chromosomes of the host can be fragmented either mechanically or preferably by 
restriction enzymes. The fragments may then be inserted into an appropriate vector, 
which may or may not have one or more genetic markers. Particularly, the vector 
should lack a replication site which would allow for replication in the eukaryotic host 

15 to be transformed. 

After transformation and passage through a number of generations, one can select 
for the presence of the marker. Only those cells containing a DNA fragment having a 
replication site will be able to retain the plasmid to any detectible degree. The cells 
20 may then be harvested, lysed. and the plasmid isolated. The inserted DNA fragment 
may be excised and used for introduction of the replication site in combination with 
the centromere. The replication site will hereinafter .be referred- to as an 
autonomously replicating segment, ARS. 

25 Where an autonomously replicating segment is known to be associated with a 
structural gene, the structural gene may be employed as a marker. By transforming 
hosts which are auxotrophic for the product expressed by the marker, one can 
select for transformed cells which are able to grow in a selective medium. Only 
those cells having the combination of the ARS and marker will survive in the 

30 selective medium. 

Once the ARS has been isolated as part of a larger fragment, the fragment may be 
reduced in size, employing endo- or exonucleases, capable of cleavage or 
processive oligonucleotide removal. The resulting fragments may be inserted in an 
35 appropriate vector and used for transfonmation. Once again, only those cells which 
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• are transformed with a functional ARS will be able to retain the plasmid in selective 
medium. If the vector includes a centromere, nonselective medium may be 
employed, since a plasmid containing only the ARS and not the centromere is 
mitotically unstable. 

5 

The ARS fragment may or may not be joined to the native genes on opposite sides 
of the ARS when combined with the centromere to fonn the artificial chromosome. 
When the ARS employed is free of the native functional genes, it will nomnally be 
less than about 1kbp. usually less than about O.Skbp and may be as small as 0.2 
10 kbp. 

As part of the artificial chromosome, the ARS may or may not be derived from the 
same host as the centromere was derived from, nor from the same cell source as 
the host cell to be transfomied by the artificial chromosome. 



15 



25 



Telomeres 



Telomeres, the last chromosomal element In lower eukaryotes to be cloned, are 
thought to be involved in the priming of DNA replication at the chromosome end. 
20 This is because conventional DNA polymerases are template dependent, syntheslse 
DNA in the 5' to 3' direction, and require an oligonucleotide primer to donate a 3' OH 
group. When this primer is removed, unreplicated single-stranded gaps arise; most 
of these gaps can be filled in by priming from 3' OH groups donated by newly 
replicated strands located at the 5* end of the gap. However, the unreplicated gaps 
which lie next to the extreme 5* end of the DNA duplex cannot be primed in this 
manner. Consequently, telomeres must provide an alternative priming mechanism. 



Telomeres are also responsible for the stability of chromosomal termini. Telomeres 
act as "caps." suppressing the recombinogenic properties of free, unmodified DNA 
30 ends. This reduces the formation of damaged and rean^nged chromosomes which 
arise as a consequence of recombination-mediated chromosome fusion events. 



Telomeres may also contribute to the establishment or maintenance of intranuclear 
chromatin organization through their association with the nuclear envelope. 
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Telomeric or telomeric-like DNA sequences have been cloned from several lower 
eukaryotic organisms, principally protozoans and yeast. The ends of the 
Tetrahymena linear DNA plasmid have been shown to function like a tek^mere on 
linear plasmids in Saccharomyces cerevisiae (see Szostak, J. W., Cold Spring 
5 Harbor Symp. Quant Biol. 47:1187-1194 (1983)). A telomere from the flagellate 
Trypanosoma has been cloned (see, for example, Blackburn et al., Cell 36:447-457 
(1984). A yeast telomeric sequence has been identified (see, for example, Shampay 
et al.. Nature 310:154-157 (1984)). 

10 Telomeres have also been identified in mammalian chromosomes for use in 
Mammalian Artificial Chromosomes (US 6,133,503) 

Artificial chromosome 

15 The artificial chromosome is a combination of a DNA segment comprising a 
centromeric function, a replicating site (ARS), and telomeres, and one or more 
genes, including regulatory genes and structural genes, which are to be expressed 
by the transformed host cell. 

20 Transformation can be achieved by using calcium shock, by exposing host, cell 
spheroplasts to the plasmid DNA under conditions favoring spheroplast fusion and 
then plating .the spheroplast in regeneration agar selecting, for the desired, 
phenotype; or other conventional techniques. 

25 The transformed host cells may then be grown on selective or nonselective medium. 
While the artificial chromosome has mitotic stability, it is well established that 
aneuploid cells will frequently lose one of the chromosomes. Since the artificial 
chromosome in nonselective medium will not be necessary for viability, loss of the 
artificial chromosome will not adversely affect the viability of the resulting "wild type" 

30 of cell. Therefore, it will usually be desirable to have a marker on the artificial 
chromosome which provides for selective pressure for the transformed host cells. 

The nature of the marker may be varied widely providing for resistance to a cell 
growth inhibitor; complementation of an auxotrophic mutation in the transformed 
35 host; morphologic change; or the like. 
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The host cells according to this invention may comprise one or several artificial 
chromosomes. When the cells comprise more than one artificial chromosome, their 
presence may be ensured by using a common marker present on all chromosomes. 
5 However it may be more advantageous to provide each artificial chromosome with a 
unique marker and select for cells having mariners corresponding to the artificial 
chromosomes, that they are supposed to contain. 

Each cell according to the invention may comprise 1 . 2, 3, 4. 5, 6, 7, 8. &, 10 or more 
10 artificial chromosomes. Each of these chromosomes may be. laid out as defined In 
the claims. 

The chromosomes may be maintained in haplold or diploid host cells. Haploid cells 
may be combined to form diploid ceils, which undergo meiosis. Upon meiosis new 
15 combinations of chromosomes may be obtained in the offspring. 

Origin of expressible nucleotide sequences 

The expressible nucleotide sequences that can be inserted into the vectors. 
20 concatemers, and cells according to this invention encompass any type of 

nucleotide such as RNA, DMA. Such a nucleotide sequence could be obtained e.g. 

from cDNA, which by its. nature is expressible- But It is also possible to use 

sequences of genomic DNA. coding for specific genes. Preferably, the expressible 

nucleotide sequences correspond to full length genes such as substantially full 
25 length cDNA, but nucleotide sequences coding for shorter peptides than the original 

full length mRNAs may also be used. Shorter peptides may still retain the catalytic 

activity similar to that of the native proteins. 

Another way to obtain expressible nucleotide sequences is through chemical 
30 syr^thesis of nucleotide sequences coding for known peptide or protein sequences. 
Thus the expressible. DNA sequences does not have to be a naturally occurring 
sequence, although it may be preferable for practical purposes to primarily use 
naturally occuning nucleotide sequences. Whether the DNA is single or. double 
stranded will depend on the vector system used. 
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In most cases the orientation with respect to the promoter of an expressible 
nucleotide sequence will be such that the coding strand is transcribed into a proper 
mRNA. It is however conceivable that the sequence may be reversed generating an 
antisense transcript in order to block expression of a specific gene. 

5 ' 

Cassettes 

An important aspect of the invention concerns a cassette of nucleotides in a highly 
ordered sequence, the cassette having the general formula in 5'->3' direction: 
10 [RS1-RS2-SP-PR-CS-TR-SP-RS2'-RSr] 

wherein RSI and RSI' denote restriction sites, RS2 and RS2' denote restriction 
sites different from RSI and RS1', SP individually denotes a spacer sequence of at 
least two nucleotides, PR denotes a promoter, CS denotes a cloning site, and TR 
denotes a temriinator. 

15 

It is an advantage to have two different restriction sites flanking both sides of the 
expression construct By treating the primary vectors with restriction enzymes 
cleaving both restriction sites, the expression construct and the primary vector will 
be left with two non-compatible ends. This facilitates a concatenation process, since 
20 the empty vectors do not participate in the concatenation of expression cassettes . 

Restriction sites 

In principle, any restriction site, for which a restriction enzyme is known can be 
25 used. These include the restriction enzymes generally, known and used in the field of 
molecular biology such as those described in Sambrook, Fritsch, Maniatis, "A 
laboratory Manual", 2"^ edition. Cold Spring Harbor Laboratory Press, 1989. 

The restriction site recognition sequences preferably are of a substantial length, so 
30 that the likelihood of occurrence of an identical restriction site within the cloned 
oligonucleotide is minimised. Thus the first restriction site may comprise at least 6 
bases, but more preferably the recognition sequence comprises at least 7 or 8 
bases. Restriction sites having 7 or more non N bases in the recognition sequence 
are generally known as "rare restriction sites" (see example 6). However, the 
35 recognition sequence may also be at least 10 bases, such as at least 15 bases, for 
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example at least 16 bases, such as at least 17 bases, for example at least 18 bases, 
such as at least 18 bases, for example at least 19 bases, for example at least 20 
bases, such as at least 21 bases, for example at least 22 bases, such as at least 23 
bases, for example at least 25 bases, such as at least 30 bases, for example at 
5 least 35 bases, such as at least 40 bases, for example at least 45 bases, such as at 
least 50 bases. 

Preferably the first restriction site RSI and RSI' is recognised by a restriction 
enzyme generating blunt ends of the double stranded nucleotide sequences. By 

10 generating blunt ends at this site, the risk that the vector participates in a 
subsequent concatenation is greatly reduced. The first restriction site may also give 
rise to sticky ends, but these are then preferably non-compatible with the sticky ends 
resulting from the second restriction site. RS2 and RS2' and with the sticky ends in 
the AC. 

15 . 

According to a preferred embodiment of the invention, the second restriction site. 
RS2 and RS2' comprises a rare restricOon site. Thus, the longer the recognition 
sequence of the rare restriction site the more rare it is and the less likely is it that the 
restriction enzyme recognising it will cleave the nucleotide sequence at other - 
20 undesired - positions. 

The rare, restriction site may furthermore-serve-as a PGR priming sita Thereby It is 
possible to copy the cassettes via PGR techniques and thus indirectly "excise" the 
cassettes from a vector. 



25 



Spacer sequence 



The spacer sequence located between the RS2 and the PR sequence is preferably, 
a non-transcribed spacer sequence. The purpose of the spacer sequence(s) is to 

30 minimise recombination between different concatemers present in the same cell or 
between cassettes present in the same concatemer. but it may also serve the pur- 
pose of making the nucleotide sequences in the cassettes more "host" like. A further 
purpose of the spacer sequence is to reduce the occunrence of hairpin fomnation 
between adjacent palindromic sequences, which may occur when cassettes are 

35 assembled head to head or tail to tail. Spacer sequences may also be convenient 
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for introducing short conserved nucleotide sequences that may serve. e.g. as PGR 
primer sites or as target for hybridization to e.g. nucleic acid or PNA or LNA probes 
allowing affinity purification of cassettes. 

The cassette may also optionally comprise another spacer sequence of at least two 
5 nucleotides between TR and RS2. When cassettes are cut out from a vector and 
concatenated into cpncatemers of cassettes, the spacer sequences together ensure 
that there is a certain distance between two successive identical promoter and/or 
terminator sequences. This distance may comprise at least 50 bases, such as at 
least 60 bases, for example at least 75 bases, such as at least 100 bases, for 

10 example at least 150 bases, such as at least 200 bases, for example at least 250 
bases, such as at least 300 bases, for example at least 400 bases, for example at 
least 500 bases, such as at least 750 bases, for example at least 1000 bases, such 
as at least 1100 bases, for example at least 1200 bases, such as at least 1300 
bases, for example at least 1400 bases, such as at least 1500 bases, for example at 

15 least 1600 bases, such as at least 1700 bases, for example at least 1800 bases, 
such as at least 1900 bases, for example at least 2000 bases, such as at least 2100 
bases, for example at least 2200 bases, such as at least 2300 bases, for example at 
least 2400 bases, such as at least 2500 bases, for example at least 2600 bases, 
such as at least 2700 bases, for example at least 2800 bases, such as at least 2900 

20 bases, for example at least 3000 basfes, such as at least 3200 bases, for example at 
least 3500 bases, such as at least 3800 bases, for example at least 4000 bases, 
such as at least 4500 bases, for example at least 5000 bases, such as at least 6000 
bases. 

25 The number of the nucleotides between the spacer located 5' to the PR sequence 
and the one located 3' to the TR sequence may be any. However, it may be 
advantageous to ensure that at least one of the spacer sequences comprises 
between 100 and 2500 bases, preferably between 200 and 2300 bases, more 
preferably between 300 and 2100 bases, such as between 400 and 1900 bases, 

30 more preferably between 500 and 1700 bases, such as between 600 and 1500 
bases, more preferably between 700 and 1400 bases. 

If the intended host cell is yeast, the spacers present in a concatemer should 
perferably comprise a combination of a few ARSes with varying lambda phage DNA 
35 fragments. 
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Preferred examples of spacer sequences include but are not limited to: Lamda 
phage DNA, prokaryotic genomic DNA such as E. coli genomic DNA, ARSes. 

Promoter 

A promoter is a DNA sequence to which RNA polymerase binds and initiates 
transcription. The promoter detemnines the polarity of the transcript by specifying 
which strand will be transcribed. 

• Bacterial promoters normally consist of -35 and -10 (relative to the 
transcriptional start) consensus sequences which are bound by a specific 
Sigma factor and RNA polymerase. 

• Eukaryotic promoters are more complex. Most promoters utilized in 
expression vectors are transcribed by RNA polymerase II. General 
transcription factors (GTFs) first bind specific sequences near the 
transcriptional start and then recruit flie binding of RNA polymerase II. In 
addition to these minimal promoter elements, small sequence elements are 
recognized specifically by modular DNA-binding / trans-activating proteins 
(e.g. AP-1 , SP-1 ) which regulate the activity of a given promoter. 

• Viral promoters may serve tiie same function as bacterial and eukaryotic 
promoters. Upon viral infection of tiieir host viral promoters direct 
transcription eitiier by using host ti^nseriptional machinery or by supplying 
virally encoded enzymes to substitute part of the host machinery. Viral 
promoters are recognised by tiie transcriptional machinery of a large number 
of host organisms and are therefore often used in cloning and expression 
vectors. 

Promoters may furthemnore comprise regulatory elements, which are DNA 
sequence elements which act in conjunction with promoters and bind either 
repressors (e.g., iacO/ LAC Iq repressor system in E. coli) or inducers (e.g., gall 
/GAL4 inducer system in yeast). In either case, transcription is virtually "shut off" 
until the promoter is derepressed or induced, at which point transcription is "tumed- 
on". The choice of promoter in the cassette is primarily dependent on the host 
organism into which the cassette is intended to be inserted. An important 
requirement to tiiis end is that the promoter should preferably be capable of 
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functioning in the host ceil, in which the expressible nucleotide sequence is to be 
expressed. 

Preferably the promoter is an externally controllable promoter, such as an inducible 
5 promoter and/or a repressible promoter. The promoter may be either controllable 
(repressible/inducible) by chemicals such as the absence/presence of chemical 
inducers, e.g. metabolites, substrates, metals, hormones, sugars. The promoter may 
likewise be controllable by certain physical parameters such as temperature, pH, 
redox status, growth stage, developmental stage, or the promoter may be 
10 inducible/repressible by a synthetic inducer/repressor such as the gal inducer. 

In order to avoid unintentional interference with the gene regulation systems of the 
host cell, and in order to improve controllability of the co-ordinated gene expression 
the promoter is preferably a synthetic promoter. Suitable promoters are described in 
15 US 5,798,227, US 5,667,986. Principles for designing suitable synthetic eukaryotic 
promoters are disclosed in US 5.559,027, US 5,877,018 or US 6,072,050. 

Synthetic inducible eukaryotic promoters for the regulation of transcription of a gene 
may achieve improved levels of protein expression and lower basal levels of gene 

20 expression. Such promoters preferably contain at least two different classes of 
regulatory elements, usually by modification of a native promoter containing one of 
the inducible elements by inserting the other of the inducible elements. For example, 
additional metal responsive elements IR:Es) and/or glucocorticoid responsive 
elements (GREs) may be provided to native promoters. Additionally, one or more 

* 25 constitutive elements may be functionally disabled to provide the lower basal levels 
of gene expression. 

Preferred examples of promoters include but is not limited to those promoters being 

induced and/or repressed by any factor selected from the group comprising 
30 carbohydrates, e.g. galactose; low inorganic phosphase levels; temperature, e.g. 

low or high temperature shift; metals or metal ions, e.g. copper ions; hormones, e.g. 

dihydrotestosterone; deoxycorticosterone; heat shock (e.g. 39°C); methanol; redox- 
■ status; growth stage, e.g. developmental stage; synthetic inducers, e.g. gal inducer. 

Examples of such promoters include ADH 1, PGK 1, GAP 491, TPl, PYK. ENO, 
35 PMA 1, PH05, GAL 1, GAL 2, GAL 10, MET25, ADH2, MEL 1, CUP 1, HSE, AOX. 
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MOX. SV40, CaMV, Opaque-2. GRE, ARE. PGK/ARE hybrid. CYC/GRE hybrid, 
TPI/a2 operator, AOX 1 , MOX A. 

More preferably, however the promoter is selected from hybrid promoters such as 
5. PGK/ARE hybrid, CYC/GRE hybrid or from synthetic promoters. Such promoters 
can be controlled without interfering too much with the regulation of native genes in 
the expression host 



10 
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Yeast promoters 

In the following, examples of known yeast promoters that may be used in 
conjunction w\\h the present invention are shown. The examples are by no way 
limiting and only serve to indicate to the skilled practitioner how to select or design 
promoters that are useful according to the present invention. 



Although numerous transcriptional promoters virtiich are functional in yeasts have 
been described in the literature, only some of them have proved effective for the 
production of polypeptides by the recombinant route. There may be mentioned In 
particular the promoters of the PGK genes (3-phosphoglycerate kinase, TDH genes 

20 encoding GAPDH (Glyceraldehyde phosphate dehydrogenase), TEF1 genes 
(Elongation factor 1), MFal (a sex pheromone precursor) which are considered as 
strong constitutive promoters oraltemativelythe regulatable-promoter CYCI which is 
repressed in the presence of glucose or PH05 which can be regulated by thiamine. 
However, for reasons which are often unexplained, they do not always allow the 

25 effective expression of the genes which they control. In this context, it is always 
advantageous to be able to have new promoters in order to generate new effective 
host/vector systems. Furthennore. having a choice of effective promoters in a given 
cell also makes it possible to envisage the production of multiple proteins in this 
same cell (for example several enzymes of the same metabolic chain) while 

30 avoiding the problems of recombination between homologous sequences. 



In general, a promoter region is situated in the 5" region of the genes and comprises 
all the elements allov»ring the transcription of a DNA fragment placed under their 
control, in particular 
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(1) a so-called minimal promoter region comprising the TATA box and the site of 
initiation of transcription, which determines the position of the site of initiation as 
well as the basal level of transcription. In Saccharomyqes cerevisiae, the length 
of the minimal prompter region is relatively variable. Indeed, the exact location of 

5 the TATA box varies from one gene to another and may be situated from -40 to - 

120 nucleotides upstream of the site of the initiation (Chen and Struhl, 1985, 
EMBO J.. 4, 3273-3280) 

(2) sequences situated upstreanri of the TATA box (immediately upstream up to 
several hundreds of nucleotides) which make it possible to ensure an effective 

10 level of transcription either constitutively (relatively constant level of transcription 

all along the cell cycle, regardless of the conditions of culture) or in a regulatable 
manner (activation of transcription in the presence of an activator and/or 
repression in the presence of a repressor). These sequences, may be of several 
types: activator, inhibitor, enhancer. Inducer, repressor and may respond to 

1 5 cellular factors or varied culture conditions. 

Examples of such promoters are the ZZA1 and ZZA2 promoters disclosed in US 
5,641,661. the EFI-d protein promoter and the ribosomal protein S7 gene promoter 
disclosed in WO 97/44470,, the COX 4 promoter and two unknown promoters (SEQ 
20 ID No: 1 and 2 in the document) disclosed in US 5,952,195. Other useful promoters 
include the HSP150 promoter disclosed in WO 98/54339 and the SV40 and RSV 
promoters disclosed in US 4,870,013 as well as the PyK and GAPDH promoters 
disclosed in EP 0 329 203 A1 . . 

25 Synthetic yeast promoters 

More preferably the invention employs the use of synthetic promoters. Synthetic 
promoters are often constructed by combining the minimal promoter region of one 
gene with the upstream regulating sequences of another gene. Enhanced promoter 
30 control may be obtained by modifying specific sequences in the upstream regulating 
sequences, e.g. through substitution or deletion or through inserting multiple copies 
of specific regulating sequences. One advantage of using synthetic promoters is that 
they may be controlled without interfering too much with the native promoters of the 
host cell. 
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One such synthetic yeast promoter comprises promoters or promoter elements of 
two different yeast-derived genes, yeast killer toxin leader peptide, and amino 
terminus of IL-1 p (WO 98/54339). 

5 Another example of a yeast synthetic promoter is disclosed in US 5,436,136 (Hinnen 
et al). which concerns a yeast hybrid promoter including a 5' upstream promoter 
element comprising upstream activation site(s) of the yeast PH05 gene and a 3' 
downstream promoter element of the yeast GAPDH gene starting at nucleotide -300 
to -1 80 and ending at nucleotide -1 of the GAPDH gene. 

10 

Another example of a yeast synthetic promoter is disclosed in US 5,089,398 
(Rosenberg et al). This disclosure describes a promoter with, the general formula - 
(P.R.(2)-P.R.(1))- 
wherein: 

15 P.R.(1) is the promoter region proximal to the coding sequence and having the 
transcription initiation site, the RNA polymerase binding site, and including the TATA 
box, the CAAT sequence, as well as translational regulatory signals, e.g., capping 
sequence, as appropriate; 

P.R.(2) is the promoter region joined to the 5 -end of P.R.(1) associated with 
20 enhancing the efficiency of transcription of the RNA polymerase binding region; 

In US 4,945,046 (Horii et al) discloses a further example of how to design a 
synthetic yeast promoter. This specific promoter comprises promoter elements 
derived both from yeast and from a mammal. The hybrid promoter consists ^ 
25 essentially of Saccharomyces cerevisiae PH05 or GAP-DH promoter from which the 
upstream activation site (UAS) has been deleted and replaced by the early 
enhancer region derived from SV40 virus. 

Cloning site 

30 . 

The cloning site in the cassette in the primary vector should be designed so that any 
nucleotide sequence can be cloned into It 

The cloning site in the cassette preferably allows directional cloning. Hereby is 
35 ensured that transcription in a host cell is performed from the coding strand in the 
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intended direction and that the translated peptide is identical to the peptide for which 
the original nucleotide sequence codes. 

However according to some embodiments it may be advantageous to insert the 
5 sequence in opposite direction. According to these embodiments,, so-called 
antisense constructs may be inserted which prevent functional expression of specific 
genes involved in specific pathways. Thereby It may become possible to divert 
metabolic intermediates from a prevalent pathway to another less dominant 
pathway. 

10 

The cloning site in the cassette may comprise multiple cloning sites, generally 
known as MCS or polylinker sites, which is a synthetic DNA sequence encoding a 
series of restriction endonuclease recognition sites. These sites are engineered for 
convenient cloning of DNA Into a vector at a specific position and for directional 
15 cloning of the insert. 

< - 

Cloning of cDNA does not have to involve the use of restriction enzymes. Other 
alternative systems include but are not limited to: 

Creator™ Cre-loxP system from Clontech, which uses recombination and loxP 
20 sites 

use of Lambda attachment sites (att-A.), such as the Gateway™ system from Life 
Technologies. 
Both of these systems are directional 

25 Terminator 

The role of the terminator sequence is to limit transcription to the length of the 
coding sequence. An optimal terminator sequence is thus one, which Is capable of 
performing this act in the host cell. 

30 

In prokaryotes, sequences known as transcriptional terminators signal the RNA 
polymerase to release the DNA template and stop transcription of the nascent RNA. 

In eukaryotes, RNA molecules are transcribed well beyond the end of the mature 
35 mRNA molecule. New transcripts are enzymatically cleaved and modified by the 
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addition of a long sequence of adenylic acid residues known as the poly-A tail. A 
polyadenylation consensus sequence is located about 10 to 30 bases upstream 
from the actual cleavage site. 

5 Preferred examples of yeast derived temninator sequences include, but are not 
limited to: ADN1 , CYC1 , GPD, ADH1 alfcohol dehydrogenase. 

Intron 

10 Optionally, the cassette in the vector comprises an intron sequence, which may be 
located 5' or 3' to the expressible nucleotide sequence. The design and layout of 
introns is well known in the art The choice of intron design largely depends on the 
intended host cell, in which the expressible nucleotide sequence is eventually to be 
expressed. The effects of having intron sequence in the expression cassettes are 

1 5 those generally associated with intron sequences. 

Examples of yeast introns can be found in the liteirature and in specific databases 
such as Ares Lab Yeast Intron Database (Version 2.1) as updated on 15 April 2000. 
Eariier versions of the database as well as extracts of the database have been 

20 published in: "Genome-wide bloinfonnatic and molecular analysis of introns in 
Saccharomyces cerevisiae.' by Spingola M, Grate L, Haussler D, Ares M Jr. (RNA 
1999 Feb;5(2):221-34) and Test of Intron predictions reveals, novel. .splice sites, 
alternatively spliced mRNAs and new Introns in meiotically regulated genes of 
yeast." by Davis CA, Grate L, Spingola M, Ares M Jr, (Nucleic Acids Res 2000 Apr 

25 15;28(8):1 700-6). 

Primary vectors (entry vectors) 

By the term entry vector is meant a vector for storing and amplifying cDNA or other 
30 expressible nucleotide sequences using the cassettes according to the present 
invention. The primary vectors are preferably able to propagate in E. coli or any 
other suitable standard host cell. It should preferably be amplifiable and amenable to 
standard normalisation and enrichment procedures. 
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The primary vector may be of any type of DNA that has the basic requirements of a) 
being able to replicate itself in at least one suitable host organism and b) allows 
insertion of foreign DNA which is then replicated together with the vector and c) 
preferably allows selection of vector molecules that contain insertions of said foreign 
5 DNA. In a preferred embodiment the vector is able to replicate in standard hosts like 
yeasts, and bacteria and it should preferably have a high copy number per host cell. 
It is also preferred that the vector in addition to a host specific origin of replication, 
contains.an origin of replication for a single stranded virus, such as e.g. the f1 origin 
for filamentous phages. This ymW ajlow the production of single stranded nucleic acid 

10 which may be useful for normalisation and enrichment procedures of cloned 
sequences. A vast number of cloning vectors have been described which are 
commonly used and references may be given to e.g. Sambrook.J; Fritsch, E.F; and 
Maniatis T. (1989) Molecular Cloning: A laboratory manual. Cold Spring Harbour 
Laboratory Press, USA, Netherlands Culture Collection of Bacteria 

15 (www. cbs . kna w. nl/N CC B/collection . htm) or Department of Microbial Genetics, 
National Institute of Genetics, Yata 1111 Mishima Shizuoka 411-8540, Japan 
f www.shiQen.nia.ac.iD/cvector/cvectQr.htmn . A few type-examples that are the 
parents of many popular derivatives are MISmpIO, pUC18, Lambda gt 10, and 
pYAC4. Examples of primary vectors include but are not limited to M13K07, 

20 pBR322, pUC18, pUC19, pUC118, pUC119, pSP64, pSP65, pGEM-3, pGEM-3Z. 
pGEM-3Zf(-), pGEM-4, pGEM-4Z, tcANIS, pBluescript II, CHARON 4A, r, 
CHARON 21 A, CHARON 32, CHARON 33, CHARON 34. CHARON 35, CHARON 
40, EMBL3A. ^001, AJDASH, XFIX, ^gtlO, ^t11, ^t18, ^t20, A.gt22, A,ORF8, 
AZAP/R, pJB8, c2RB, pcoslEMBL 

25 

Methods for cloning of cDNA or genomic DNA into a vector are well known in the 
art. Reference may be given to J. Sambrook, E.F. Fritsch, T. Maniatis: Molecular 
Cloning, A Laboratory Manual (2"^ edition, Cold Spring Harbor Laboratory Press, 
1989). 
30 . 

One example of a circular model entry vector is described in Figure 3. The vector, 
EVE contains the expression cassette, R1-R2-Spacer-Promoter-Multi Cloning Site- 
Terminator-Spacer-R2-R1 . The vector furthermore contains a gene for ampicillih 
resistance, AmpR, and an origin of replication for E.coli, ColEI. 
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The entry vectors EVE4. EVES, and EVES shown in Figures 4, 5, and 6. These all 
contain Srfi as R1 and AscI as R2. Both of these sites are palindromic and are 
regarded as rare restriction sites having 8 bases in the recognition sequence. The 
vectors furthennore contain the AmpR ampicillin resistance gene, and the C0IEI 

6 origin or replication for E.coli as well as f1, which is an origin of replication for 
filamentous phages, such as iVl13. EVE4 (Fig. 4) contains the i^ET25 promoter and 
the ADH1 terminator. Spacer 1 and spacer 2 are short sequences deriving from the 
multiple cloning site, MCS. EVES (Fig. 5) contains the CUP1 promoter and the 
ADH1 temiinator. EVES (Fig. 6) contains the CUP1 promoter and the ADH1 . 

10 terminator. The spacers of EVES are a 550 bp lambda phage DMA (spacer 3) and 
an ARS sequence from yeast (spacer 4). 

Nucleotide library (entry library) 

15 Methods as well as suitable vectors and host cells for constructing and maintaining 
a library of nucleotide sequences in a cell are well known in the art The primary 
requirement for the library is that is should be possible to store and amplify in it a 
number of priniary vectors (constructs) according to this invention, the vectors 
(constructs) comprising expressible nucleotide sequences from at least one 

20 expression state and wherein at least two vectors (constructs) are different 

One specific example of such a library is the well known and widely employed-cDNA 
libraries. The advantage of the cDNA library is mainly that it contains only DNA 
sequences con-esponding to transcribed messenger RNA in a cell. Suitable methods 
25 are also present to purify the isolated mRNA or the synthesised cDNA so that only 
substantially full-length cDNA is cloned into the library. 

IVIethods for optimisation of the process to yield substantially full length cDNA may 
comprise size selection, e.g. electrophoresis, chromatography, precipitation or may 
30 comprise ways of increasing the likelihood of getting full length cDNAs. e.g. the 
SMART™ method (Clonetech) or the CapTrap™ method (Stratagene). 



35 



Preferably the method for making the nucleotide library comprises obtaining a 
substantially full length cDNA population comprising a normalised representation of 
cDNA species. More preferably a substantially fulj length cDNA population 
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comprises a normalised representation of cDNA species cliaracteristic of a given 
expression state. 

Normalisation reduces the redundancy of clones representing abundant mRNA 
5 species and increases the relative representation of clones from rare mRNA 
species. 

Methods for nomnalisation of cDNA libraries are well known in the art. Reference 
. may be given to suitable protocols for normalisation such as those described in US 
10 5J63.239 (Dl VERSA) and WO 95/08647 and WO 95/11986. .and Bonaldo, Lennon, 
Scares, Genome Research 1996, 6:791-806; Ali, Holloway, Taylor. Plant Mol Biol 
Reporter, 2000, 18:123-132. 

Enrichment methods are used to isolate clones representing mRNA which are 
15 characteristic of a particular expression stiate. A number of variations of the method 
.broadly termed as subtractive hybrisation are known in the art. Reference may be 
given to Sive, John, Nucleic Acid Res, 1988, 16:10937; Diatchenko, Lau. Campbell 
et al, PNAS, 1996, 93:6025-6030; CamincI, Shibata, Hayatsu, Genome Res. 2000. 
10:1617-30. Bonaldo, Lennon, Scares, Genome Research 1996, 6:791-806; Ali, 
20 Holloway. Taylor, Plant Mol Biol Reporter, 2000, 18:123-132. For example, 
enrichment may be achieved by doing additional rounds of hybridization similar to 
normalization procedures,. -using, e.g. cDNA from a library of abundant clones or 
simply a library representing the uninduced state as a driver against a tester library 
from the induced state. Alternatively mRNA or PGR amplified cDNA derived from the 
25 expression state of choice can be used to subtract common sequences from a tester 
library. The choice of driver and tester population will depend on the nature of target 
expressible nucleotide sequences in each particular experiment 

In the library an expressible nucleotide sequence coding for one peptide is 
30 preferably found in different but similar vectors under the control of different 
promoters. Preferably the library comprises at least three primary vectors with an 
expressible nucleotide sequence coding for the same peptide under the control of 
three different promoters. More preferably the library corhprises at least four primary 
vectors with an expressible nucleotide sequence coding for the same peptide under 
iS the control of four different promoters. More preferably the library comprises at least 
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five primary vectors with an expressible nucleotide sequence coding for the same 
peptide^ under the control of five different promoters, such as comprises at lest six 
primary vectors with an expressible nucleotide sequence coding for the same 
peptide under the control of six different promoters, for example comprises at least 
5 seven primary vectors with an expressible nucleotide sequence coding for the same 
peptide under the control of seven different promoters, for example comprises at 
least eight primary vectors with an expressible nucleotide sequence coding for the 
same peptide under the control of eight different promoters, such as comprises at 
least nine primary vectors with an expressible nucleotide sequence coding for the 
10 same peptide under the control of nine different promoters, for example comprises 
at least ten primary vectors with an expressible nucleotide sequence coding for the 
same peptide under the control of ten different promoters. 

The expressible nucleotide sequence coding for the same peptide preferably 
15 comprises essentially the same nucleotide sequence, more preferably the same 
nucleotide sequence. 

By having a library with what may be termed one gene under the control of a 
number of different promoters in different vectors, it is possible to construct from the 
20 nucleotide library an array of combinations of genes and promoters. Preferably, one 
library comprises a complete or substantially complete combination such as a two 
dimensional array of genes and promoters, wherein substantially allrienes are found . 
under the control of substantially all of a selected number of promoters. 

25 According tp another embodiment of the invention the nucleotide library comprises 
combinations of expressible nucleotide sequences combined in different vectors 
with different spacer sequences and/or different intron sequences. Thus any one 
expressible nucleotide sequence may be combined in a two, three, four or five 
dimensional an-ay with different promoters and/or different spacers and/or different 

30 introns and/or different tenninators. The two, three,' four or.five dimensional an-ay 
may be complete or incomplete, since not all combinations will have to be present. 

The library may suitably be maintained in a host cell comprising prokaryotic cells or 
eukaryotic cells. Preferred prokaryotic host organisms may include but are not 
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limited to Escherichia coli, Bacillus subtilis, Streptomyces lividans, Streptomyces 
coelicolor Pseudomonas aemginosa, Myxococcus xanthus. 

Yeast species such as Saccharomyces cerevisi.ae (budding yeast), 
5 Schizosaccharomyces pombe (fission yeast), Pichia pastoris, and Hansenula 
polymorpha (methylotropic yeasts) may also be used. Filamentous ascomycetes, 
such as Neurospora crassa and Aspergillus nidulans may also be used. Plant cells 
such as those derived from Nicotiana and Arabidopsis are preferred. Preferred 
mammalian host cells include but are not limited to those derived from humans, 
10 monkeys and rodents, such as Chinese hamster ovary (CHO) cells, NIH/3T3, COS; 
293, VERO, HeLa etc (see Kriegler M. in "Gene Transfer and Expression: A 
Laboratory Manual", New York, Freeman & Co. 1 990). 

Concatemers 

15 . 

A concatemer is a series of linked units. In the present context a concatemer is used 
to denote a number of serially linked nucleotide cassettes, wherein at least two of 
the serially linked nucleotide units comprises a cassette having the basic structure 



[re2-SP-PR-X-TR-SP-rsi] 



20 



wherein 

rsi and rs2 together denote a restriction site, 

SP individually denotes a spacer of at least two nucleotide bases, 

PR denotes a promoter, capable of functioning in a cell, 

X denotes an expressible nucleotide sequence, 

TR denotes a terminator, and 

SP individually denotes a spacer of at least two nucleotide bases. 



25 



Optionally the cassettes comprise an intron sequence between the promoter and the 
expressible nucleotide sequence and/or between the terminator and the expressible 



30 



sequence. 



The expressible nucleotide sequence in the cassettes of the concatemer may 
comprise a DNA sequence selected from the group comprising cDNA and genomic 
DNA. 



wo 02/059330 



PCT/DK02/00058 



32 

According to one aspect of the invention, a concatenner connprises cassettes with 
expressible nucleotide from different expression states, so that non-naturally 
occurring combinations or non-native combinations of expressible nucleotide 
sequences are obtained. These different expression states may represent at least 
• 5 two different tissues, such as at least two organs, such as at least two species, such 
as at least two genera. The different species may be from at least two different 
phylae, such as from at least two different classes, such as from at least two 
different divisions, more preferably from at least two different sub-kingdoms, such as 
from at least two different kingdoms. 

10 

For example, the expressible nucleotide sequences may originate from eukaryots 
such as mammals such as humans, mice or whale, from reptiles such as snakes ^ 
crocodiles or turtles, from tiinicates such as sea squirts, from lepidoptera such as 
butterflies and moths, from coelenterates such as jellyfish, anenomes, or corals, 

15 from fish such as bony and cartilaginous fish, from plants such as dicots, e.g. coffee, 
oak or monocots such as grasses, lilies, and orchids; from lower plants such as 
algae and gingko, from higher fungi such as terrestrial fruiting fungi, from marine 
actinomycetes. The expressible nucleotide sequences may also originate from 
protozoans such as malaria or trypanosomes, or from prokaryotes such as E. coil or 

20 archaebacteria.. Furthermore, the expressible nucleotide sequences may originate 
from one or more preferably from more expression states from the species and 
genera listed in the table below. 

Streptomyces , Micromonospora, Norcadia, Actinomadura. Actinoplanes, 
Streptosporangium, Microbispora, Kitasatosporiam. Azobacterium. Rhizobium, 
Achromobacterium, EnterobaGterium, Brucella, Micrococcus, Lactobacillus. Bacillus 
(B.t. toxins), Clostridium (toxins), Brevibacterium, Pseudomonas. Aerobacter. Vibrio, 
Halobacterium. Mycoplasma, Cytophaga. Myxococcus 

Amanita muscaria (fly agaric, ibotenic acid, muscimol). Psilocybe (psilocybin) 
Physarium. Fuligo. Mucor, Phytophtora* Rhizopus. Aspergillus, Penicillium 
(penicillin). Coprinus, Phanerochaete, Acremonium (Cephalosporin), Trochoderma, 
Helminthosporium, Fusarium. Altemaria, Myrothecium, Saccharomyces 

Digenea simplex (kainic acid, antihelminthic), Laminaria anqustata (laminine, 
hypotensive) 



25 Bacteria 



30 

Fungi 



35 

Algae 
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Lichens 



33 

Usnea fasciata (vulpinicacid, antimicrobial; usnic acid, antitumor) 



10 



Higher Plants Artemisia (artemisinin). Coleus (forslcolin), Desmodium (K cliannel agonist). 

Catharanthus (Vinca alkaloids), Digitalis (cardiac glycosides), Podophyllum 
(podophyllotoxin). Taxus (taxol), Cephalotaxus (homoharringtonine), Camptotheca 
(Camptothecin), Camellia sinensis (Tea), Cannabis indica, Cannabis saliva (Hemp). 
Erythroxylum coca (Coca). Lophophora willtamsii (PeyoteMyristica fragrans 
(Nutmeg). Nicotiana, Papaver somniferum (Opium Poppy), Phalaris arundinacea 
(Reed canary grass) 



Protozoa 



Ptychodtscus brevis; DInoflagellates (brevitoxin, cardiovascular) 



15 



Sponges Microciona prolrfera (ectyonin, antimicrobial) Cryptotethya cryta (D-arabino 

furanosides) 

Coelenterata . Portuguese Man o War & other jellyfish and medusoid toxins. 



20 



Corals 



Pseudoterogonia species (Pseudoteracins, anti-inflammatory). Erythropodium 
(erythrolides. anti-Inflammatory) 



Aschelminths Nematode secretory compounds 



Molluscs 



Conus toxins, sea slug toxins', cephalapod neurotransmitters, squid inks 



25 



Annelida 



Lumbriconereis heteropa (nerelstoxin, insecticidal) 



Arachnids 



Dolomedes ("fishing spider" venoms) 



Crustacea Xenobalanus (skin adhesives) 

30 

Insects Epilachna (mexican bean t>eetle alkaloids) 



Spinuhculida Bonelliaviridis (boneI]in,neuroactive) 



35 Bryozoans Bugula neritina (bryostatins.anti cancer) 



Echinoderms Crinoid chemistry 



40 



Tunicates Trididemnum solidum (drdemnln,anti-tumor and anti-viral; Ecteinascidia turbinata 

ecteinascidins, anti-tumbr) 



Vertebrates 



Eptatretus stoutii (eptatretin .cardioactive). Trachinus draco (proteinaceous toxins, 
reduce blood pressure, respiration and reduce heart rate). Dendrobattd frogs 
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(batrachotoxins, pumillotoxins, histrionicotoxins, and other polyamines); Snake 
venom toxins; Orinthorhynohus anatinus (duck-billed platypus venom), modified 
carotenoids, retinoids and steroids; Avians: liistrionicotoxins, modified carotenoids, 
retinoids and steroids 

5 

According to a preferred embodiment of the invention the concatemer comprises at 
least a first cassette and a second cassette, said first cassette being different from 
said second cassette. More preferably, the concatemer comprises cassettes, 
wherein substantially all cassettes are different. The difference between the 
10 cassettes may arise from differences between promoters, and/or expressible 
nucleotide sequences, and/or spacers, and/or terminators, and/or introns. 

The number of cassettes in a single concatemer is largely determined by the host 
species into which the concatemer is eventually to be inserted and the vector 

15 through which the insertion is carried out. The concatemer thus may comprise at 
least 10 cassettes, such as at least 15, for example at least 20, such as at least 25, 
for example at least 30, such as from 30 to 60 or more than 60, such as at least 75, 
for example at least 100, such as at least 200, for example at least 500. such as at 
least 750, for example at least 1000. such as at least 1500, for example at least 

20 2000 cassettes. 

Each of the cassettes may be laid out as described above. 

Once the concatemer has been assembled or concatenated it may be ligated into a 
25 suitable vector. Such a vector may advantageously comprise an artificial 
chromosome. The basic requirements for a functional artificial chromosome have 
been described in US 4,464,472, the contents of which is hereby incorporated by 
reference. An artificial chromosome or a functional minichromosome, as it may also 
be termed must comprise a DNA sequence capable of replication and stable mitotic 
30 maintenance in a host cell comprising a DNA segment coding for centromere-like 
activity during mitosis of said host and a DNA sequence coding for a replication site 
recognized by said host. 

Suitable artificial chromosomes include a Yeast Artificial Chromosome (YAC) (see 
35 • e.g. Murray et al, Nature 305:189-193; or US 4,464.472), a mega Yeast Artificial 
Chromosome (mega YAC), a Bacterial Artificial Chromosome (BAC), a mouse 
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artificial chromosome, a Mammalian Artificial Chromosome (MAC) (see e.g. US 
6,133.503 or US 6,077,697), an Insect Artificial Chromosome (BUGAC), an Avian 
Artificial Chromosome (AVAC), a Bacteriophage Artificial Chromosome, a 
Baculovirus Artificial Chromosome, a plant artificial chromosome (US. 5,270,201), a 
5 BIBAC vector (US 5,977,439) or a Human Artificial Chromosome (HAC). 

The artificial chromosome is preferably so large that the host cell perceives it as a 
"real" chromosome and maintains it and transmits it as a chromosome. For yeast 
and other suitable host species, this will often correspond approximately to the size 
10 of the smallest native chromosome in the species. For Saccharomyces, the smallest 
chromosome has a size of 225 Kb. 

MACS may be used to construct artificial chromosomes from other species, such as 
iiisect and fish species. The artificial chromosomes preferably are fully functional 
15 stable chromosomes. Two types of artificial chromosomes may be used. One type, 
refen-ed to as SATACs [satellite artificial chromosomes] are stable heterochromatic 
chromosomes, and the other type are minichromosomes based on amplification of 
euchromatin. . 

20 Mammalian artificial chrornosomes provide extra-genomic specific integration sites 
for introduction of genes encoding proteins of interest and permit megabase size 
DNA integration, such as Jntegration of concatemecs according to the invention. 

According to another embodiment of the invention, the concatemer may be 
25 integrated into the host chromosomes or cloned into other types of vectors, such as 
a plasmid vector, a phage vector, a viral vector or a cosmid vector. 

A preferable artificial chromosome vector is one that is capable of being 
conditionally amplified in the host cell, e.g. in yeast. The amplification preferably is at 
30 least a 10 fold amplification. Furthermore, It is advantageous that the cloning site of 
the artificial chromosome vector can be modified to comprise the same restriction 
site as the one bordering the cassettes described above, i.e. RS2 and/or RS2'. 
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Concatenation 

Cassettes to be concatenated are normally excised from a vector either by digestion 
with restriction enzymes or by PCR. After excision the cassettes may be separated 
5 from the vector through size fractionation such as gel filtration or through tagging of 
known sequences in the cassettes. The isolated cassettes may then be joined 
together either through interaction between sticky ends or through ligation of blunt 
ends. 

10 Single-stranded compatible ends may be created by digestion with restriction en- 
zymes. For concatenation a prefenred enzyme for excising the cassettes would be a 
rare cutter, i.e. an enzyme that recognises a sequence of 7 or more nucleotides. 
Examples of enzymes that cut very rarely are the meganucleases. many of which 
are intron encoded, like e.g. I-Ceu I, l-Sce I, l-Ppo I, and Pl-Psp I (see eample 6d for 

15 more). Other prefenred enzymes recognize a sequence of 8 nucleotides like e.g. Asc 
I. AsiS I. CciN I, CspB I, Fse I. MchA I, Not I. Pac I, Sbf I, Sda I. Sgf I. SgrA I. 
Sse232 I. and Sse8387 I. all of which create single stranded, palindromic compatible 
ends. 

20 Other prefen-ed rare cutters, which may also be used to control orientation of 
individual cassettes in the concatemer are enzymes that recognize non-palindromic 
sequences like e.g. Aar U Sap I, Sfi I. Sdi I. and Vpa (see example 6c for more). 

Alternatively, cassettes can be prepared by the addition of restriction sites to the 
25 ends, e.g. by PCR or ligation to linkers (short synthetic dsDNA molecules). 
Restriction enzymes are continuously being isolated and characterised and it is 
anticipated that many of such novel enzymes can be used to generate single- 
stranded compatible ends according to the present invention. 

30 It is conceivable that single stranded compatible ends can be made by cleaving the 
vector with synthetic cutters. Thus, a reactive chemical group that will normally be 
able to cleave DNA unspecifically can cut at. specific positions when coupled to 
another molecule that recognises and binds to specific sequences. Examples of 
molecules that recognise specific dsDNA sequences are DNA, PNA. LNA, 

35 phosphothioates, peptides, and amides. See e.g. Amnitage. B.(1998) Chem. Rev. 
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98: 1171-1200, who describes photocleavage using e.g. anthraquinone and UV 
light; Dervan P.B. & BQrIi R.W. (1999) Curr. Opin. Chem. Biol. 3: 688-93 describes 
the specific binding of polyamides to DNA; Nielsen, P.E. (2001) Cun". Opin. 
Biotechnol. 12: 16-20 describes the specific binding of PNA to DNA, and Chemical 
5 Reviews special thematic issue: RNA/DNA Cleavage (1998) vol. 98 (3) Bashkin J.K. 
(ed.) ACS publications, describes several examples of chemical DNA cleavers. 

Single-stranded compatible ends may also be created by using e.g. PCR primers 
including dUTP and then treating the PCR product with Uracil-DNA glycosylase 
10 (Ref: US 5,035,996) to degrade part of the primer. Alternatively, compatible ends 
can be created by tailing both the vector and insert with complimentary nucleotides 
using Terminal Transferase (Chang, LMS, Bollum TJ (1971) J Biol Chem 246:909). 

. It is also conceivable that recombination can be used to generate concatemers, e.g. 

15 through the modification of techniques like the Creator™ system (Clontech) which 
uses the Cre-loxP mechanism (Sauer B 1993 Methods Enzymol 225:890-900) to 
directionally join DNA molecules by recombination or like the Gateway™ system . 
(Life Technologies, US 5,888,732) using lambda att attachment sites for directional 
recombination (Landy A 1989, Ann Rev Biochem 58:913). It is envisaged that also 

20 lambda cos site dependent systems can be developed to allow concatenation. ' 

More preferably the cassettes may be concatenated without an intervening 
purification step through excision from a vector with two restriction enzymes, one 
leaving sticky ends on the cassettes and the other one leaving blunt ends in the 
25 vectors. This is the preferred method for concatenation of cassettes from vectors 
having the basic structure of [RS1-RS2-SP-PR-X-TR-SP-RS2'-RS11. 

An altemative way of producing concatemers free of vector sequences would be to 
PCR amplify the cassettes from a single stranded primary vector. The PCR product 
30 must include the restriction sites RS2 and RS2' which are subsequently cleaved by 
its cognate enzyme(s). Concatenation can then be performed using the digested 
PCR product, essentially without interference from the single stranded primary 
vector template or the small double stranded fragments, which have been cut from 
the ends. 



35 
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The concatemer may be assembled or concatenated by concatenation of at least 
two cassettes of nucleotide sequences each cassette comprising a first sticky end, a 
spacer sequence, a promoter, an expressible nucleotide sequence, a terminator, a 
spacer sequence, and a second sticky end. A flow chart of the procedure is shown 
5 in figure 2a. 

Preferably concatenation further comprises 

starting from a primary vector [RS1-RS2-SP-PR-X-TR-SP-RS2 -RSV], 
wherein X denotes an expressible nucleotide sequence, 
10 RS1 and RSI* denote restriction sites, 

RS2 and RS2' denote restriction sites different from RS1 and RSI', 
SP individually denotes a spacer sequence of at least two nucleotides, ( 
PR denotes a promoter, 
TR denotes a terminator, 
15 i) cutting the primary vector with the aid of at least one restriction 

enzyme specific for RS2 and RS2' obtaining cassettes having the 
general fomriula [rs2-SP-PR-X-TR-SP-rsi] wherein rsi and rs2 together 
denote a functional restriction site RS2 or RS2', 
ii) assembling the cut out cassettes through Interaction between rsi and 
20 rs2. 

In this way at least 10 cassettes can be concatenated, such as at least 15, for 
example at least 20, such as at least 25, for example at least 30, such as from 30 to 
60 or more than 60, such as at least 75, for example at least 100, such as at least ^ 
25 200, for example at least 500, such as at least 750, for example at least 1000, such 
as at least 1 500, for example at least 2000. 

According to an especially preferred embodiment, vector arms each having a RS2 
or RS2' in one end and a non-complementary overhang or a blunt end in the other 

30 end are added to the concatenation mixture together with the cassettes described 
above to further simplify the procedure (see Fig. 2b). One example of a suitable 
. vector for providing vector amns is disclosed in Fig. 7 TRP1, URA3, and HIS3 are 
auxotrophic maricer genes, and AmpR is an E. coli antibiotic marker gene. CEN4 is 
a centromer and TEL are telomeres. ARS1 and PMB1 allow replication in yeast and 

35 E. coli respectively- BamH I and Asc I are restriction enzyme recognition sites. The 
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nucleotide sequence of the vector is set forth in SEQ ID NO 4. The vector is 
digested \Anth BamHI and AscI to liberate the vector arms, which are used for ligation 
to the concatemer. 

5 The ratio of vector arms to cassettes determines the maximum number of cassettes 
in the concatemer as illustrated in figure 8. The vector arms preferably are artificial 
chromosome vector arms such as those described in Fig. 7. 

. It is of course also possible to add stopper fragments to the concatenation solution, 
10 the stopper fragments each having a RS2 or RS2' in one end and. a non- 
complementary overhang or a blunt end in the other end. The ratio of stopper 
fragments to cassettes can likewise control the maximum size of the concatemer. 

The complete sequence of steps to be taken when starting with the isolation of 
1 5 mRNA until inserting into an entry vector may include the following steps 

i) isolating mRNA from an expression state, 

ii) obtaining substantially full length cDNA corresponding to the mRNA 
sequences, 

Iii) inserting the substantially full length cDNA into a cloning site In a 
20 cassette in a primary vector, said cassette being of the general 

fonnula in 5'^3' direction: 
[RS1-RS2"SP-PR-CS-TR-SP-RS2*-RS1T 
wherein CS denotes a cloning site. 

25 In preparation of the concatemer, genes may be isolated from different entry 
libraries to provide the desired selection of genes. Accordingly, concatenation may 
further comprise selection of vectors having expressible nucleotide sequences from 
at least two different expression states, such as from two different species. The two 
different species may be frpm two different classes, such as from two different 

30 divisions, more preferably from two different sub-kingdoms, such as from two 
different kingdoms. 

As an alternative to including vector arms in the concatenation reaction It is possible 
. to ligate the concatemer into an artificial chromosome selected from the group 



wo 02/059330 



PCT/DK02/00058 



40 

comprising yeast artificial chromosome, mega yeast artificial chromosome, bacterial 
artificial chromosome, mouse artificial chromosome, human artificial chromosome. 

Preferably at least one inserted concatemer further comprises a selectable marker, 
5 The marker(s) are conveniently not included in the concatemer as such but rather in 
an artificial chromosome vector, into which the concatemer is Inserted. Selectable 
markers generally provide a means to select, for growth, only those cells which 
contain a vector. Such markers are of two types: drug resistance and auxotrophy. A 
dnjg resistance marker enables cells to grow in the presence of an otherwise toxic 
10 compound. Auxotrophic markers allow cells to grow in media lacking an essential 
component by enabling cells to synthesise the essential component (usually an 
amino acid). 

Illustrative and non-limiting examples of common compounds for which selectable 
1 5 markers are available with a brief description of their mode of action follow: 

Prokaryotic 



Ampiciliin: interferes with a temninal reaction in bacterial cell wall synthesis. 
The resistance gene (bla) encodes beta-lactamase which cleaves the beta- 
lactam ring of the antibiotic thus detoxifying it. 



20 



Tetracycline: prevents bacterial protein synthesis by binding to the SOS 
ribosomal subunit. The resistance gene (tet) specifies a protein that modifies 
the bacterial membrane and prevents accumulation of the antibiotic in the 



cell. 



Kanamycin: binds to the 70S ribosomes and causes misreading of 
messenger RNA. The resistant gene (nptH) modifies the antibiotic and 
prevents interaction with the ribosome. 



25 



Streptomycin: binds to the SOS ribosomal subunrt, causing misreading of 
messenger RNA. The resistance gene (Sm) modifies the antibiotic and 
prevents interaction with the ribosome. 



30 



Zeocin: this new bleomycin-family antibiotic intercalates into the DNA and 
cleaves it. The Zeocin resistance gene encodes a 1S,665 dalton protein. This 
protein confers resistance to Zeocin by binding to the antibiotic and 
preventing it from binding DNA. Zeocin is effective on most aerobic cells and 
can be used for selection in mammalian cell lines, yeast, and bacteria. 
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Eukaryotic 

• Hygromycin: a aminocyclitpl that inhibits protein synthesis by disrupting 
ribosome translocation and promoting nnistranslation. The resistance gene 

- (hph) detoxifies hygromycin -B- phosphorylation. 
5 • Histidinol: cytotoxic to mammalian cells by inhibiting histidyl-tRNA synthesis 

in histidine free media. The resistance gene (hisD) product inactivates 
histidinol toxicity by converting it to the essential amino acid, histidine. 

• Neomycin (G418): blocks protein synthesis by interfering with ribosomal 
functions. The resistance gene ADH encodes amino glycoside 

10 phosphotransferase which detoxifies G418. 

• Uracil: Laboratory yeast strains carrying a mutated gene which encodes 
orotidine -5- phosphate decarboxylase, an enzyme essential for uracil 
biosynthesis, are unable to grow in the absence of exogenous uracil. A copy 
of the wild-type gene (ura4+, S. pombe or URA3 S. cerevistae) carried on 

15 the vector will complement this defect in transformed ceils. 

• Adenosine: Laboratory strains carrying a deficiency in adenosine synthesis 
may be complemented by a vector carrying the yvild type gene, ADE 2. 

• Amino acids: Vectors carrying the wild-type genes for LEU2, TRP 1, HIS^S or 
LYS 2 may be used to complement strains of yeast deficient in these genes. 

20 • Zeocin: this new bieomycin-family antibiotic intercalates into the DNA-and 

cleaves it The Zeocin resistance gene encodes a 1 3,665 dalton protein. This 
protein confers resistance to Zeocin by binding to the antibiotic and 
preventing it from binding DNA. Zeocin is effective on most aerobic cells and. 
can be used for selection in mammalian cell lines, yeast, and bacteria. 



25 



Transgenic cells 



In one aspect of the invention, the concatemers comprising the multitude of 
cassettes are introduced into a host cell, in which the concatemers can be 
30 maintained and the expressible nucleotide sequences can be expressed in a co- 
- ordinated way. The cassettes comprised in the concatemers may be isolated from 
the host cell and re-assembled due to their uniform structure with -preferably - 
concatemer restriction sites between the cassettes. 
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The host cells selected for this purpose are preferably cultivable under standard 
laboratory conditions using standard culture conditions, such as standard media and 
protocols. Preferably the host cells comprise a substantially stable cell line, in which 
the concatemers can be maintained for generations of cell division. Standard 
5 techniques for transformation of the host cells and in particular methods for insertion 
of artificial chromosomes into the host cells are known. 

It is also of advantage if the host cells are capable of undergoing meiosis to perform 
sexual recombination. It is also advantageous that meiosis is controllable through 
10 extemal manipulations of the cell culture. One especially advantageous host cell 
type is one where the cells can be manipulated through extemal manipulations into 
different mating types. 

The genome of a number of species have already been sequenced more or less 
15 completely and the sequences can be found in databases. The list of species for 
which the whole genome has been sequenced increases constantly. Preferably the 
host cell is selected from the group of species, for which the whole genome or 
essentially the whole genonie has been sequenced. The host cell should preferably 
be selected from a species that is well described in the. literature with respect to 
20 genetics, metabolism, physiology such as model organism used for genomics 
research. 

The host organism should preferably be conditionally deficient in tiie abilities to 
undergo homologous recombination. The host organism should preferably have a 
25 codon usage similar to that of the donor organisms. Furthermore, in the case of 
genomic DNA, if eukaryotic donor organisms are used, it is preferable that the host 
organism has the ability to process the donor messenger RNA properly, e.g., splice 
out introns. 

30 The host cells can be bacterial, archaebacteria, or eukaryotic and can constitute a 
homogeneous cell line or mixed culture. Suitable cells include the bacterial and 
eukaryotic cell lines commonly used in genetic engineering and protein expression. 

Preferred prokaryotic host organisms may include but are not limited to Escherichia 
35 coli, Bacillus feutitilis, B licehnifomiis, • B. cereus, Streptomyces lividans. 
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Streptomyces coelicolor, Pseudomonas aeruginosa, Myxococcus xanthus. 
Rhodococcus, Streptomycetes, Actinomycetes, Corynebacteria, Bacillus, 
Pseudomonas, Salmonella, and EnA/inia. The complete genome sequences of. E. 
coli and Bacillus subtilis are described by Blattner et aL, Science 277, 1454-1462 
5 (1997); Kunst et al., Nature 390, 249-256 (1997)). 

Preferred eukaryotic host organisms are mammals, fish, insects, plants, algae and 
fungi. 

10 Examples of mammalian cells include those from, e.g., monkey, mouse, rat, 
hamster, primate, and human, both cell lines and primary cultures. Preferred 
mammalian host cells include but are not limited to those derived from humans, 
monkeys and rodents, such as Chinese hamster ovary (CHO) cells, NIH/3T3, COS, 
293. VERO, HeLa etc (see Kriegler M. in "Gene Transfer and Expression: A 

15 Laboratory Manual", New York, Freeman & Co. 1990), and stem cells, including 
embryonic stem cells and hemopoietic stem cells, zygotes, fibroblasts, lymphocytes, 
kidney, liver, muscle, and skin cells. 

Examples of insect cells include baculo iepidoptera. 

20 

Examples of plant cells include maize, rice, wheat, cotton, soybean, and sugarcane. 
Plant cells such as those derived from Nicotiana and Arabidopsis areJpneferred 

Examples of fungi include penicillium, aspergillus, such as Aspergillus nidulans, 
25 podospora, neurospora, such as Neufospora crassa, saccharomyces, such as 
Saccharomyces cerevisiae (budding yeast), Schizosaccharomyces, such as 
Schizosaccharomyces pombe (fission yeast), Pichia spp, such as Pichia pastoris, 
and Hansenula polymorpha (methylotropic yeasts). 

30 In a preferred embodiment the host cell is a yeast cell, and an illustrative and not 
limiting list of suitable yeast host cells comprise: baker's yeast, Kluyveromyces 
marxianus, K.. lactis, Candida utilis, Phaffia rhodozyma, Saccharomyces boulardii, 
Pichia pastoris, Hansenula polymorpha, Yarrowia lipolytica, Candida paraffinica, 
Schwanniomyces castellii, Pichia stipitis, Candida shehatae, Rhodotorula glutinis, 

35 Lipomyces lipofer, Cryptococcos curvatus, Candida spp. (e.g. C. palmioleophila). 
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Yarrowia lipolytica, Candida guilliermondii, Candida, Rhodotorula spp., 
• Saccharomycopsis spp., Aureobasidium pullulans, Candida brumptii, Candida 
hydrocarbofumarica, Torulopsis, Candida tropicalis, Saccharomyces cerevisiae, 
Rhodotorula rubra, Candida flaveri, Eremothecium ashbyii, Pichia spp., Pichia 
5 pastoris. kluyveromyces. Hansenula, Kloeckera, Pichia, Pachysolen spp., or 
Torulopsis bombicoia. - • 

. The choice of host will depend on a number of factors, depending on the intended 
use of the engineered host, including pathogenicity, substrate range, environmental 
10 hardiness, presence of key intermediates, ease of genetic manipulation, and 
likelihood of promiscuous transfer of genetic information to other organisms. 
Particularly advantageous hosts are E. coli, lactobacilli. Streptomycetes, 
Actinomycetes, Saccharomyces and filamentous fungi. 

15 In any one host cell it is possible to make all sorts of combinations of expressible 
nucleotide sequences from all possible sources. Furthermore, it is possible to nnake 
combinations of promoters and/or spacers and/or introns and/or terminators in 
combination with one and the same expressible nucleotide sequence. 

20 Thus in any one cell there may be expressible nucleotide sequences frorn two 
different expression states. Furthemnore, these two different expression states may 
be from one species or advantageously from two different species. Any one host cell 
may also comprise expressible nucleotide sequences from at least three species, 
such as from at least four, five, six, seven, eight, nine or ten species, or from more { 

25 than 15 species such as from more than 20 species, for example from more than 30. 
40 or 50 species, such as from more than 100 different species, for exaniple from 
more than 300 different species, such as form more than 500 different species, for 
- example from more than 1000 different species, thereby obtaining combinations of 
large numbers of expressible nucleotide sequences from a large number of species. 

30 In this way potentially unlimited numbers of combinations of expressible nucleotide 
sequences can be combined across different expression states. These different 
expression states may represent at least two different tissues, such as at least two 
organs, such as at least two species, such as at least two genera. The different 
species may be from at least two different phylae. such as from at least two different 
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classes, such as from at least two different divjsions, more preferably from at least 
two different sub-kingdoms, such as from at least two different kingdoms. 

Any two of these species may be from two different classes, such as from two 
5 different divisions, more preferably from two different sub-kingdoms, such as from 
two different kingdoms. Thus expressible nucleotide sequences may be combined 
from a eukaryot and a prokaryot into one and the same cell. 

According to another embodiment of the invention, the expressible nucleotide 
10 sequences may be from one and the same expression state. The products of these 
sequences may interact with the products of the genes in the host cell and fonm new 
enzyme combinations leading to novel biochemical pathways. Furthermore, by 
puttfng the expressible nucleotide sequences under the control of a number of 
promoters it becomes possible to switch on and off groups of genes in a co- 
15 ordinated manner. By doing this with expressible nucleotide sequences from; only 
one expression states, novel combinations of genes are also expressed. 

The number of concatemers in one single cell may be at least one concatemer per 
cell, preferably at least 2 concatemers per cell, more preferably 3 per c^ll, such as 4 

20 per cell, more preferably 5 per cell, such as at least 5 per cell, for example at le.ast 6 
per cell, such as 7, 8, 9 or 10 per cell, for example more than 10 per cejl. As 
described above, each concatemer may preferably compdse up to 1000 cassettes, 
and it is envisages that one concatemer may comprise up to 2000 cassettes. By 
inserting up to 10 concatemers into one single cell, this cell may thus be enriched 

25 with up to 20,000 heterologous expressible genes, which under suitable conditions 
may be tumed on and off by regulation of the regulatable promoters. 

Often it is more preferable to provide cells having anywhere between 10 and 1000 
heterologous genes, such as 20-900 heterologous genes, for example 30 to 800 

30 heterologous genes, such as 40 to 700 heterologous genes, for example 50 to 600 
heterologous genes, such as from 60 to 300 heterologous genes or from 100 to 400 
heterologous genes which are inserted as 2 to 4 artificial chromosomes each 
containing one concatemer of genes. The genes may advantageously be located on 
1 to 10 such as from 2 to 5 different concatemers in the cells. Each concatemer may 

35 advantageously comprise from 10 to 1000 genes, such as from 10 to 750 genes, 
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such as from 10 to 500 genes, such as from 10 to 200 genes, such as from 20 to 
100 genes, for example from 30 to 60 genes, or from 50 to 1 00 genes. 

The concatemers may be inserted into the host cells according to any known 
5 transformation technique, preferably according to such transformation techniques 
that ensure stable and not transient transformation of the host cell. The concatemers 
may thus be inserted as an artificial chromosome which is replicated by the cells as 
they divide or they may be inserted into the chromosomes of the host cell. The 
concatemer may also be inserted in the form of a plasmid such as a plasmid vector, 

10 a phage vector, a viral vector, a cosmid vector, that is replicated by the cells as they 
divide. Any combination of the three insertion methods is also possible. One or more 
concatemers may thus be integrated into the chromosome(s) of the host cell and 
one or more concatemers may be inserted as plasmids or artificial chromosomes. 
One or more concatemers may be inserted as artificial chromosomes and one or 

15 more may be inserted into the same cell via a plasmid. 

Examples 
Example 1 

20 

In the examples 1-3 an AscI site was introduced into the EcoR1 site in pYAC4 
(Sigma,-Burke -DT et al. 1.987, Science vol 236, p 806), so that sticky ends match the 
AscI site( = RS2 in general fonmula of this patent) of the cassettes in pEVE vectors 

25 Preparation of EVACs (EVolvable Artificial Chrornosomes) including size frac- 
tioning 

preparation of pYAC4-Asc arms 

1 . inoculate 150 ml of LB (sigma) with a single colony of E. coli DH5a containing 
pYAC4-Asc 

30 2. grow to OD600 - 1 , harvest cells and make plasmid preparation 

3. digest lOOjig pYAG4-Asc w. BamHI and AscI 

4. dephosphorylate fragments and heat inactivate phosphatase( 20 min, 80 C) 

5. purify fragments(e.g. Qiaquick Gel Extraction Kit) 

6. run 1 % agarose gel to estimate amount of fragment 
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Preparation of expression cassettes 

1 . take 100 \ig of plasmid preparation from each of the following libraries 

a) pMA-CAR 

b) pCA-CAR 

5 c) Phaffia cDNA library 

d) Carrot cDNA library 

2. digest w. Srf1( 10 units/prep, 37C overnight) 

3. dephosphorylate (10 units/prep, 37C, 2h) 

4. heat inactivate 80C, 20 min 

0 5. concentrate and change buffer (precipitation or ultra filtration), 

6, digest w. Ascl. (10 units/prep, 37 C. overnight) 

7. adjustvoluineof preps to 100 |iL 

preparation of EVACs 

5 



Different types of EVACs have been made by varying the ratio of the different li- 
braries that goes into the ligation reaction. 





pMA-CAR 


pCA-CAR 


Phaffia cDNA 


Carrot cDNA 


EVAC 










A 


40% 


40% 


10% 


10% 


B 


25% 


25% 


25% 


25% 



1 . add -1 00 ng arms of pYAC4-Asc /1 00 iig of cassette mixture 

2. concentrate to < 33.5 ^iL 

3. add 2.5 units of T4 DNA-ligase + 4 ^iL lOx ligase buffer. Adjust to 40 ^iL 

4. ligateSh, 16C 

5. stop reaction by adding 2 jiL of 500 mM EDTA 

6. bring reaction volume to 125 ixL, add 25 jiL loading mix, heat at 60C for 5 
min 

7. distribute evenly in 1 0 wells of a 1 % LMP agarose gel 

8. run pulsed field gel (CHEF III, 1% LMP agarose. 'A strength TBE (BioRad), 
angle 120, temperature 12 C, voltage 5.6V/cm, switch time ramping 5 - 25 s, 
run time 30 h) 
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9. stain part of the gel that contains molecular weight marlcers + 1 sample lane 
for quality check 

10. cut remaining 9 sample lanes corresponding to mw. 97 - 1 94 kb(fraction 1); 
1 94 - 291 kb(fraction 2); 291 -365 kb(fraction 3) from the gel 

5 11. agarase gel in high NaCI agarase buffer . 1 u agarase / lOOjig gel. 40C 3 h 

12. concentrate preparation to < 20 jiL 

13. transform suitable yeast strain w. preparation using alkali/cation transforma- 
tion 

14. plate on selective minimal media plates 
10 15. incubate 30 C for 4-5 days 

16. pick colonies 

17. analyse colonies 



Example 2 

1 5 Preparation of EVACs (EVoivable Artificial Cliromosoines) with direct trans- 
formation 

preparation of pYAC4-Asc arms; 

1 . inoculate 1 50 ml of LB with a single colony of DH5a containing pYAC4-Asc 

2. gro>A( to OD600 - 1 , harvest cells and make plasmid preparation 
20 3. digest lOOjxg pYAC4-Asc w. BamHI and AscI 

4. dephosphorylate fragments and heat inactivate phosphatase( 20 min. 80 C) 

5. purify fragments(e.g. Qiaquick Gel Extraction Kit) 

1 . run 1 % agarose gel to estimate amount of fragment 

25 Preparation of expression cassettes 

1 . take 100 ^ig of plasmid preparation from each of the following libraries 

e) pMA-CAR 

f) pCA-CAR 

g) Phaffia cDNA library 
30 h) Canrot cDNA library 

2. digest w. Srf 1 ( 1 0 units/prep, 37C overnight) 

3. dephosphorylate (10 uriits/prep, 37C, 2h) 

4. heat inactivate 80C, 20 min 

5. concentrate and change buffer (precipitation or ultra filtration). 
35 6. digest w. AscI. (10 units/prep. 37 C. overnight) 
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7. adjust volume of preps to .1 00 jiL 
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preparation of EVACs 

5 Different types of EVACs have been made by varying the ratio of the different li- 
braries that goes into the ligation reaction. 





pMA-CAR 


pCA-CAR 


Phaffia cDNA 


Carrot cDNA 


EVAC 










A 


40% 


40% 


10% 


10% 


B 


25% 


25% 


25% 


25% 



1 . concentrate to < 32 juL 

2. add 1 unit of T4 DNA-ligase + 4 jiL 10x ligase buffer. Adjust to 40 
10 3. ligate 2 h, 16 C 

4. stop reaction by adding 2 p.L of 500 mM EDTA. heat Inactivate 60C. 20 mih 

5. bring reaction volume to 500 fxL with dHaO, concentrate to 30 jxL 

6. add 1 0 U Asc1 , 4 fiL 1 0X AscI buffer, bring to 40 juiL . 

7. incubate at 37C for 1h (alternatively 15 min 30 min) 
15 8. heat inactivate 60C. 20 min 

9. add 2 ^g YAC4-Asc amis. 1 U T4 DNA ligase, 10 10X ligase buffer, bring 
tolOOixL 

10. incubate ON, 16C 

1 1 . add water to 500 

20 12. concentrate to 25 |ilL 

13. transform suitable yeast strain w. preparation using alkali/cation transforma- 
tion or other suitable transformation method 

14. plate on selective minimal media plates 

1 5. incubate 30 C for 4-5 days 
25 16. pick colonies 

1 7. analyse colonies . 
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Example 3 

Preparation of EVACs (EVolvable Artificial Chromosomes) (Small scale prepa- 
ration) 

5 Preparation of expression cassettes 

1 . inoculate 5 ml of LB-medium (Sigma) with library inoculum conresponding to a 
1 0+ fold representation of library. Grow overnight 

2. make plasmid miniprep from 1 .5 ml of culture (E.g. Qiaprep spin minlprep kit) 

3. digest plasmid w. Srf 1 

10 4. dephosphorylate fragments and heat inactivate phosphatase( 20 min, 80 C) 

5. digest w.Asd 

6. run 1/1 0 of reaction in 1 % agarose to estimate amount of fragment 

preparation of pYAC4-Asc arms 
15 1 . inoculate 1 50 ml of LB with a single colony of E. coli DH5a containing pYAC4- 
Asc 

2. grow to OD600 ~ 1 . harvest cells and make plasmid preparation 

3. digest lOOixg pYAC4-Asc w, BamHI and AscI 

4. dephosphorylate fragments and heat inactivate phosphatase( 20 min, 80 C) 
20 5. purify fragments(E.g. Qlaquick Gel Extraction Kit) 

6. run 1 % agarose gel to estimate amount of fragment 

preparation of EVACs 

1 . mix expression cassette fragments with YAC-arms so that cassette/ann ration is 
25 -1000/1 

2. if needed concentrate mixture(use e.g. Microcon YM30) so fragment concentra- 
tion > 75 ng/|iL reaction 

3. add 1 U T4 DNA ligase, incubate 16C, 1-3 h . Stop reaction by adding 1 p.L of 
SOOmMEDTA 

30 4. run pulsed field gel (CHEF III, 1% LMP agarose, 34 strength TBE. angle 120. 

temperature 12 C, voltage 5.6V/cm. switch time ramping 5 - 25 s, run time 30 h) 
Load sample in 2 lanes. 

5. stain part of the gel that contains molecular weight markers 

6. cut sample lanes con-esponding to mw. 1 00 - 200 kb 

35 7. agarase gel in high NaCI agarase buffer . 1 U agarase / 100 mg gel 
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8. concentrate preparation to < 20 

9. transform suitable yeast strain w. preparation using electroporation 

10. plate on selective minimal media plates 

1 1 . incubate 30 C for 4-5 days 
.5 12. pick colonies 

Example 4: cDNA libraries used in the production of EVACs 

1 . Daucus carota, carrot root library: 
10 • Full length 

• Oligo dT primed, directional pDNA library 

• cDNA library made using a pool of 3 Evolve EVE 4, 5 & 8 vectors (Fig. 4, 5, 6) 

• Number of independent clones: 41.6x1 0® 

• Average size: 0.9 - 2.9 kb 

15 • Numberof different genes present: 5000-10000 

2. Xanthophyllomyces dendrorhous, (yeast), hole organism library 

• Full length 

• Oligo dT primed, directional cDNA library 

20 • cDNA library made using a pool of 3 Evolve EVE 4, 5 & 8 vectors (Fig. 4, 5, 6}' 

• Number of independent clones: 48.0 x 10® 

• Average size: 1.0 - 3.8 kb 

• Number of different genes present: 5000 -1 0000 

25 3. Target carotenoid gene cDNA library 

• Full length and normalised 

• Directional cDNA cloning 

• Library made by cloning each gene individually in 2 Evolve EVE 4, 5 & 8 vectors 
(Fig.4. 5. 6) 

30 • Number of different genes: 48 

• Species and genes used: 

• Gentlana sp., ggps, psy, pds, zds. Icy-b, Icy-e, bhy, zep 

• Rhodobacter capsulatus, idi, crtC, crtF 

• EnArtnia uredovora, crtE, crtB, crti, crtY, crtZ 
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. • Nostoc anabaena, zds 

• Synechococcus PCC7942, pds 

• Erwinia herbicola, crtE, crtB, crti, crtY, crtZ 

• Staphylococcus aureus, crtM, crtN 

5 • Xanthophyllomyces dendrorhous, crtI, crtYb 

• Capsicum annuum, ccs, crtL . 

• Nicotiana tabacum, crtL, bchy 

• Prochlorococcus sp., Icy-b, Icy-e 

• Saccharomyces cerevisiae, idi 

10 • Corynebacterium sp.. crtI, crtYe, crtYf, crtEb 

• Lycopersicon esculentum, psy-1 

• Neurospora crassa, all 



Example 5: Transformation of EVACs 
15 Example 5a: Transformation 

1. Inoculate a single colony into 100 ml YPD broth and grow with aeration at 30°C 
to mid log. 2 x 10^ to 2 x 10^ cells/ml. 

2. Spin to pellet cells at 400 x g for 5 minutes; discard supernatant. 

3. Resuspend cells. in a total of 9 ml TE, pH 7.5.. Spin to pellet cells and discard 
20 supernatant. 

4. Gently resuspend cells In 5 ml 0.1 M Lithium/Cesium Acetate solution, pH 7.5. 

5. Incubate at 30°C for 1 hour with gentle shaking. 

6. Spin at 400 x g for 5 minutes to pellet cells and discard supernatant. 

7. Gently resuspend in 1 ml TE, pH 7.5. Cells are now ready for transformation. 
25 8. In a 1 .5 ml tube combine: 

• 100 pi yeast cells 

• 5 pi Carrier DNA (10 mg/ml) 

• 5 pi Histamine Solution 

• 1/5 of an EVAC preparation in a 10 pi volume (max). (One EVAC 
30 preparation is made of 1 00 |xg of concatenation reaction mixture) 

9. Gently mix and incubate at room temperature for 30 minutes. 

10. In a separate tube, combine 0.8 mi 50% (w/v) PEG 4000 and 0.1 ml TE and 0.1 
ml of 1 M LiAc for each transformation reaction. Add 1 ml of this PEG/TE/LiAc 
mix to each transfomnation reaction. iVlix cells into solution with gentle pipetting. 
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11. Incubate at 30**C for 1 hour. 

12. Heat shock at 42^*0 for 15 minutes; cool to SO'^C. 

13. Pellet cells in a microcentrifuge at high speed for 5 seconds and remove 
supematant. 

5 14. Resuspend in 200 |jl of rich media and plate in appropriate selective media 
15. Incubate at SO'^C for 48-72 hours until transformant colonies appear. 

Example 5b: Transfomiation of EVACs using electroporation 

10 100 ml of YPD is inoculated with one yeast colony and grown to ODeoo = 1.3 to 1.5. 
. The culture is harvested by centrifuging at 4000 x g and 4^*0. The cells are 
resuspended in 16 ml sterile H2O. Add 2 ml 10 x TE buffer, pH 7.5 and iswirl to mix. 
Add 2 ml 10 X lithium acetate solution (1 M, pH 7.5) and swirl to mix. Shake gently 
45 min at 30°C. Add 1.0 ml 0.5 M DTE while swirling. Shake gently 15 min at 30°C. 

15 The yeast suspension is diluted to 100 ml with sterile water. The cells are washed 
and concentrated by centrifuging at 4000 x g, resuspending the pellet in 50 ml ice- 
cold sterile water, centrifuging at 4000 x g. resuspending the pellet in 5 ml ice-cold 
sterile water, centrifuging at 4000 x g and resuspending the pellet in 0.1 ml ice-cold 
sterile 1 M sorbitol. The electroporation was done using a BioRad Gene Falser, In a 

20 sterile 1.5-ml microcentrifuge tube 40 pi concentrated yeast cells were mixed with 5 
pi 1:10 diluted EVAC preparation. The yeast-DNA mix is transferred to an ice-cold 
0.2-cm-gap disposable electroporation cuvette and pulsed, at 1.5 kV, 25 pF, 200 Q. 
1 ml ice-cold 1 M sorbitol is added to the cuvette to recover the yeast. Aliquots are 
spread on selective plates containing 1 M sorbitol.. Incubate at 30''C until colonies 

25 appear. 

Example 6: Rare restriction enzymes with recognition sequence and cleavage . 
points 

In this example, rare restriction enzymes are listed together with their recognition 
30 sequence and cleavage points. {^) indicates cleavage points 5-3' sequence and (_} 
indicates cleavage points in the complementary sequence. 



35 



W = AorT: N = A.C,G.orT 

6a) Unique, palindromic overhang 
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10 



15 



AscI 

AsiSI 

CciNI 

CspBI 

Fsel 

MchAI 

NotI 

Pad 

Sbfl 

Sdal 

Sgfl 

SgrAI 

Sse232I 

Sse8387I 



GG-^CGCG^CC 
GCG_AT^CGC 
GC^GGCC^GC 
GCGGCC_GC 
GG^CCGG'^CC 
GC^GGCC^GC 
GC^GGCC_GC 
TTA_AT'^TAA 
CC_TGCA^GG 
CC_TGCA'^GG 
GCG_AT^CGC 
CR^CCGG_YG 
CG'^CCGG_CG 
CC TGCA-^GG 



6b) No overliang 

20 BstRZ246I ATTT'^AAAT 

BstSWI ATTT'^AAAT 

MspSWI ATTT^AAAT 

Ms si GTTT^AAAC 

Pmel GTTT^AAAC 

25 Smil ATTT^AAAT 

Srfl GCCC:^G&GC 

Swal ATTT^AAAT 



30 



35 



40 



45 



50 
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6c) 

Aarl 

Abel 

Alol 

Bael 

BbvCI 

Cpol 

Cspl 

Pfl27I 

Ppil 

PpuMI 

PpuXI 

PspSII 

PspPPI 

RsrII 

Rsr2I 

SanDI 

Sapl 

Sdil 

SexAI 

Sf il 

Ssel825I 
Sse8647I 
VpaK32I 



Non-pal indromic and/or variable overhang 

CC^TCA_GC 

CC^TCA_GC 
CG'^GWC^CG 
CG^GWC_CG 
RG^GWC_CY 

RG'^GWC^CY 

RG'^GWC^CY 

RG-^GWC^CY 

RG-'GWC^CY 

CG^GWC_CG 

CG^GWC_CG 

GG^GWC_CC 

GCTCTTCN'^NlSrN_ 

GGCCN^NNN'^NGGCC 

A^CCWGG_T 

GGCCN_lSnSIN^NGGCC 

GG^GV^C_CC 

AG^GWC_CT 

GCTCTTCN^N]SIN„ 



60 



6d) 

I-Sce I 
I-Ceu I 



Meganucleases 

TAGGGATAA^CAGG'^GTAAT 
ACGGTC^CTAA'^GGTAG 
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I-Cre I 
I-Sce II 
I-Sce III 
Endo- See I 
5 PI -See I 
PI~Psp I 
I-Ppo I 
HO 

I-Tev I 

10 

More meganucleases have been identified, but their precise sequence of recognition 
has not been determined, see e.g. www.meganuclease.com 



.15 

Example 7: Concatemer size limitation experiments (use of stoppers) 



Materials used: 

pYAC4 (Sigma. Buri^e et al. 1987, science, vol 236, p-806) was digested w. EcoR1 
20 and BamHI and dephosphorylated 

pSE420 (invitrogen) was linearised using EcoR1 and used as the model fragrrient 
for concatenation. 

T4 DNA ligase (Amersham-phamnacia biotech) was used for ligation according to 
manufacturers instructions. 

25 

Method: Fragments and arms were mixed In the ratios(cohcentrations are arbitrary : . » 

units) indicated on figures 9a and 9b. Ligation was allowed to proceed for 1 h at 
16C. Reaction was stopped by tiie addition of 1 |xL 500 mM EDTA. Products were 
analysed by standard agarose GE (1 % agarose, Yz strength TBE) or by 
30 PFGE(CHEF III, 1% LMP agarose, % strength TBE, angle 120, temperature 12 C. 
voltage 5.6V/cm, switch time ramping 5 - 25 s, run time 30 h) 



AAACGTC_GTGA'^GACAGTTT 
GGTC_ACCCTGAAGTA 
GTTTTGG_TAAC ^ TATTTAT 
GATGCTGC.AGGC ^ ATAGGCTTGTTTA 
GG^GTGC^GGAGAA 

TGGCAAACAGCTA_TTAT^GGGTATTATGGGT 

CTCTC_TTAA^GGTAG 

TTTCCGC.AACA'^GT 

NN_1SIN'^]SINTCAGTAGATGTTTTTCTTGGTCTACCGTTT 



The results are shown in Figure 9, wherein it is shown that the size of concatemers 
is proportional to the ratio of cassettes per YAC arms. 

35 

Example 8: Integration of expression cassettes into artificial chromosomes 

Integration of expression cassettes into YAC12 was done essentially as done by 
Sears D.D., Hieter P., Simchen G., Genetics, 1994, 138, 1055-1065. 



40 
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An Ascl site was introduced into the Bgl 11 site of the integration vectors pGS534 and 
pGS525. 

A p-galactosidase gene, as well as crtE, crtB, crti and crtY from Erwinia Uredovora 
5 were cloned into pEVE4. These expression cassettes were ligated into Ascl of the 
modified integration vectors pGS534 and pGS525. 

Linearised pGS534 and pGS525 containing the expression cassettes were , 
transformed into haploid yeast strains containing the appropriate target YAC which 
10 carries the Ade" gene. Red Ade- transfonnants were selected (the parent host strain 
is red due to the ade2-101 mutation). 

Additional confirmation of correct integration of the p-galactosidase expression cas- 
sette was done using a p-galactosidase assay. 

15 

Example 9: Re-transformation of cells that already contain Artificial 
chromosomes to obtain at least 2 artificial chromosomes per cell 

Yeast strains containing YAC12, Sears D.D., Hieter P., Simchen G.. Genetics, 1994, 
20 138 , 1055-1065 were transformed with EVACs following the protocol described in 
example 4a. The transfomned cells were plated on plates that select for cells that 
contained both YAG12 and EVACs. 

Example 10: Example of different expression patterns "phenotypes" obtained 
25 using the same yeast clones under different expression conditions: 

Colonies were picked with a sterile toothpick and streaked sequentially onto plates 
corresponding to the four repressed and/or induced conditions (-Ura/-Trp, -Ura/- 
Trp/-Met, -Ura/-Trp/+20d pM CU2SO4, -Ura/-Trp/-Met/+200 pM CU2SO4). 20 mg 
30 adenin was added to the media to suppress the ochre phenotype. 
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Claims 

1. An artificial chromosome comprising at least one nucleotide concatemer, the 
concatemer comprising in the 5'— >3' direction cassettes of nucleotide sequences 

5 of the general formula 

[rs2-SP-PR.X-TR-SP-rsi]n 

wherein 

10 rsi and rsz together denote a restriction site, 

SP denotes a spacer of at least two nucleotide bases, 
PR denotes a promoter, capable of functioning in a cell, 
X denotes an expressible nucleotide sequence, 
TR denotes a terminator, and 

15 SP denotes a spacer of at least two nucleotide bases, and 

n>2. 

2. The artificial chromosome according to claim 1, wherein the nucleotide 
sequence comprises a DNA sequence selected from the group comprising 

20 cDNA, genomic DNA. . [ 

3. The artificial chromosome according to claim 1, wherein the nucleotide 
sequence is single stranded, or partly single stranded. 

25 4. The artificial chromosome according to claim 1 , wherein the nucleotide . 
sequence is double stranded. 

5. The artificial chromosome according to any of the preceding claims, comprising 
nucleotide sequences from one expression state. 



30 



6. The. artificial chromosome according to any of the preceding claims 1 to 4, 
comprising nucleotide sequences from at least two expression states. 
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. 7. The artificial chromosome according to any of the preceding claims, wherein the 
rs1-rs2 restriction site of at least two cassettes are recognised by the same 
restriction enzyme, more preferably are identical. 

5 8. The artificial chromosome according to claim 7, wherein the. rs1-rs2 restriction 
site of essentially all cassettes are recognised by the same restriction enzyme, 
more preferably are identical. 

9. The artificial chromosome according to any of the preceding claims, wherein 
10 substantially all cassettes are different 

10. The artificial chromosome according to any of the preceding daims, wherein the 
difference comprises different promoters, and/or different expressible nucleotide 
sequences, and/or different spacers and/or different temilnators and/or different 

15 introns. 

1 1 . The artificial chromosome according to any of the preceding claims, wherein n is 
at least 10, such as at least 15, for example at least 20, such as at least 25, for 
example at least 30, such as from 30 to 60 or more than 60, such as at least 75, 

20 for example at least 1 00, such as at least 200, for example at least 500, such as 

at least 750, for example at least 1000, such as at least 1500, for example at 
least 2000. 

12. The artificial chromosome according to any of the preceding claims, wherein the 
25 artificial chromosome is selected firom the graup comprising a Yeast Artificial 

Chromosome, a mega Yeast Artificial Chromosome, a Bacterial Artificial 
Chromosome, a mouse artificial chromosome, a Mammalian Artificial 
Chromosome, an Insect Artificial Chromosome, an Avian Artificial Chromosome, 
a Bacteriophage Artificial Chromosome, a Baculovirus Artificial Chromosome, or 
30 a Human Artificial Chromosome. 

.13. The artificial chromosome according to any of the preceding claims, wherein the 
chromosome further comprises at least one selectable genetic marker, such as 
a recessive or a dominant marker. 



35 
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14. The artificial chromosome according to claim 13, comprising at least two 
selectable genetic markers. 

15. The artificial chromosome according to claim 13 to 14, wherein the at least one 
5 marker comprises a marker selected from the group comprising LEU 2, TRP 1, 

HIS 3. LYS 2. URA 3, ADE 2, Amyloglucosidase, (3-Iaclamase, CUP 1, 6418*^, 
TUN^ kiLkl. C230, SMR1. SFA, Hygromycin'^, methotrexate^, 
chloramphenicol'^, Diuron*^, Zeocin*^, Canavanine"^. 

10 16. The artificial, chromosome according to any of the preceding claims, being 
designed to minimise the level of repeat sequences occurring in the concatemer. 

17. The artificial chromosome according to any of the preceding claims, further 
comprising an intron sequence between the promoter and the expressible 

15 nucleotide sequence. V 

18. The artificial chromosome according to any of the preceding claims, wherein the 
restriction site comprises a restriction site from the list Example 6. 

20 19. The artificial chromosome according to claim 18, wherein the restriction^ site 
comprises at least 6 bases such as at least 8 bases, for example at least 10 
bases. 

20. The artificial chromosome according to any of the preceding claims, wherein the 
25 GC content of the restriction site is more than 40%, preferably more than 50%, 

more preferably equal to or more than 60%. 

21 . The artificial chromosome according to any of the preceding claims, wherein the 
restriction enzyme recognising the restriction site produces sticky ends upon 

30 cleavage of a double stranded nucleotide sequence, preferably wherein the 

sticky ends have a pre-determined nucleotide sequence. 

22. The artificial chromosome according to any of the preceding claims, further 
comprising a spacer sequence between TR and rsa- 

35 
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23. The artificial ciiromosome according to any of tlie preceding claims, wherein the 
spacer and the optional spacer sequence together comprise at least 50 bases, 
such as at least 60 bases, for example at least 75 bases, such as at least 100 
bases, for example at least 150 bases, such as at least 200 bases, for example 

5 at least 250 bases, such as at least 300 bases, for example at least 400 bases. 

for example at least 500 bases, such as at least 750 bases, for example at least 
1000 bases, such as at least 1 100 bases, for example at least 1200 bases, such 
as at least 1300 bases, for example at least 1400 bases, such as at least 1500 
bases, for example at least 1600 bases, such as at least 1700 bases, for 

10 example at least 1800 bases, such as at least 1900 bases, for example at least 

2000 bases, such as at least 2100 bases, for example at least 2200 bases, such 
as at least 2300 bases, for example at least 2400 bases, such as at least 2500 
bases, for example at least 2600 bases, such as at least 2700 bases, for 
example at least 2800 bases, such as at least 2900 bases, for example at least 

15 3000 bases, such as at least 3200 bases, for example at least 3500 bases, such 

as at least 3800 bases, for example at least 4000 bases, such as at least 4500 
bases, for example at least 5000 bases, such as at least 6000 bases. 

24. The artificial chromosome according to any of the preceding claims, wherein at 
20 least one of the spacer sequences comprises between 50 and 2500 bases, such 

as between 100 and 2500 bases, preferably between 200 and 2300 bases, more 
preferably between 300 and 2100 bases, such as between 400 and 1900 bases, 
more preferably between 500 and 1700 bases, such as between 600 and 1500 
bases, more preferably between 700 and 1400 bases. 

25 

25. The artificial chromosome according to any of the preceding claims, wherein at 
least one of the promoters, preferably substantially all promoters is/are an 
externally controllable promoter, which are functional in a host cell. 

30 26. The artificial chromosome according to claim 25, wherein at least one of the 
promoters is an inducible promoter or a repressible promoter. 

27. The artificial chromosome according to any of the preceding claims, comprising 
at least one promoter comprising both repressible and inducible elements. 
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28. The artificial cliromosome according to any of the preceding claims, comprising 
at least one promoter being chemically Inducible and/or represslble and/or 
inducible/repressibie by temperature, and/or inducible/repressible according to 
mating type. 

5 . ' 

29. The artificial chromosome according to any of the preceding claims, comprising 
at least one promoter being induced by any factor selected from the group 
comprising carbohydrates, e.g. galactose; low inorganic phosphate levels; 
temperature, e.g. low or high temperature shift; metals or metal ions, e.g. copper 

10 ions; hormones, e.g. dihydrotestosterone; or deoxycorticosterone; heat shock 

(e.g. 39*^0); methanol; redox-status; growth stage, e.g. developmental stage; 
synthetic inducers, e.g. the gal inducer. 

30. The artificial chromosome according to any of the preceding claims, wherein at 
15 least one promoter is repressed by any factor selected from the group 

comprising carbohydrates; galactose; low inorganic phosphate levels; 
temperature; low or high temperature shfft; metals or metal ions; copper ions; 
hormones; dihydrotestosterone; deoxycorticosterone; heat shock (e.g. 39**C); 
methanol; redox-status; growth stage; developmental stage; synthetic inducers; 
20 gal inducer; high inorganic phosphate levels; methionine; glycerol. 

31. The artificial .chromosome according to any of the preceding claims, wherein at 
least pne promoter comprises a promoter selected from the group comprising 
ADH 1, PGK 1, GAP 491, TPI, PYK, ENO, PMA 1, PH05, GAL 1, GAL 2, GAL 

25 10, MET25, ADH2, MEL 1, CUP 1, HSE, AOX, MOX, SV40, CaMV, Opaque-2, 

GRE, ARE, PGK/ARE hybrid, CYC/GRE hybrid, TPI/a2 operator, AOX 1, MOX 
A. 

32. The artificial chromosome according to any of the preceding claims, wherein at 
30 least one promoter is a synthetic promoter. 

33. The artificial chromosome according to any of the preceding claims, wherein the 
terminator is capable of functioning in a host cell. 
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34. An artificial cliromosome comprising at least a first and a second expressible 
nucleotide sequence under the control of a controllable promoter, the promoter 
of the first expressible nucleotide sequence being controllable independently 

5 from the promoter of the other expressible nucleotide sequence. 

35. The artificial chromosome according to claim 1, wherein comprising at least one 
promoter comprising an inducible promoter or a repressible promoter. 

10 36. The artificial chromosome according to any of the preceding claims 1 to 35, 
comprising at least one promoter comprising both repressible and Inducible 
elements. i 

37. The artificial chromosome according to any of the preceding claims 1 to 36, 
15 comprising at least one promoter being chemically Inducible and/or repressible 

and/or Indudble/represslble by temperature, and/or Inducible/represslble 
according to mating type. 

38. The artificial chromosome according to any of the preceding claims 1 to 37, 
20 comprising at least one promoter being Induced by any factor selected from the 

group comprising carbohydrates, e.g. galactose; low Inorganic phosphase 
levels; temperature, e.g. low-or high temperature shift; metals or metal Ions, e.g. 
copper Ions; honnones, e.g. dihydrotestosterone; or deoxycorticosterone; heat 
shock (e.g. 39''C); methanol; redox-status; growth stage, e.g. developmental ^ 
25 stage; synthetic inducers, e.g. the gal Inducer. 

39. The artificial chromosome according to any of the preceding claims 1 to 38, 
wherein at least one promoter Is repressed by any factor selected from the 
group comprising carbohydrates, e.g. galactose; low inorganic phosphate levels, 

30 e.g. high inorganic phosphate levels;; temperature, e.g. low or high temperature 

shift; metals or metal ions, e.g. copper Ions; honnones, e.g. dihydrotestosterone; 
deoxycorticosterone; heat shock (e.g. 39°C); methanol; redox-status; growth 
stage, e.g. developmental stage; synthetic Inducers, e.g. the gal inducen 
methionine; glycerol. 

35 
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40. The artificial chromosome according to any of the preceding claims 1 to 39, 
wherein at least one promoter comprises a promoter selected from the group 
comprising ADH 1, PGK 1, GAP 491, TPI, PYK, ENO, PMA 1, PH05, GAL 1, 
GAL 2, GAL 10. MET25. ADH2, IVIEL 1, CUP 1, HSE, AOX, MOX, SV40, CaMV, 

5 Opaque-2, GRE, ARE, PGK/ARE hybrid, CYC/GRE hybrid, TPI/a2 operator, 

AOX 1, MOX A. 

41. The artificial chromosome according to any of the preceding claims 1 to 40, 
wherein at least one promoter is a synthetic promoter. 

10 . 

42. The artificial chromosome according to any of the preceding claims 1 to 41, 
comprising at least 10 expressible nucleotide sequences, such as at least 15, for 
example at least 20, such as at least 25, for example at least 30, such as from 
30 to 60 or more than 60, such as at least 75, for example at least 100, such as 

15 at least 200. for (example at least 500. such as at least 750, for example at least 

1000, such as at least 1500, for example at least 2000. 

43. The artificial chromosome according to any of the preceding claims 1 to 42, 
comprising nucleotide sequences under the control of at least 3 different 

20 pronioters being regulated through extemal manipulations, such as at least 4 

different promoters, for example at least 5 different promoters, such as at least 6 
different promoters, for example at least 7 different promoters, such as at least 8 
different promoters, for example at least 9 different promoters, such as at least 
10 different promoters, for example at least 12 different promoters, such as at 

25 least 1 5 different promoters, for example at least 20 different promoters, such as 

at least 25 different promoters, for example at least 30 different promoters, such 
as at least 50 different promoters or 1 00 different promoters; 

44. The artificial chromosome according to any of the preceding claims 1 to 43, 
30 comprising at least two nucleotide sequences coding for the same peptide or 

two substantially identical nucleotide sequences under the control of at least 2 
different promoters, such as 3 or 4 different promoters, for example at least 5 
different promoters, such as at least 6 different promoters, for example at least 7 
different promoters, such as at least 8 different promoters, for example at least 9 
35 different prompters, such as at least 10 different promoters, for example at least 
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12 different promoters, such as at least 15 different promoters, for example at 
least 20 different promoters, such as at least 25 different promoters, for example 
at least 30 different promoters, such as at least 50 different promoters or 100 
.. different promoters. 

5 

45. The artificial chromosome according to claim 44, comprising at least a selection 
of combinations of promoters and nucleotide sequences. 

46. The artificial chromosome according to claim 45, vwhereby the selection 
ID comprises combinations from a two dimensional anray of promoters and 

nucleotide sequences. 

47. The artificial chromosome according to claim 45, whereby ttie selection 
comprises a partial or complete combination from a n-dimensional anay of 

15 promoters, nucleotide sequences, spacers, temiinators, and introns, wherein n 

is an integer from 1 to 5. 

48. The artificial chromosome according to any of the preceding claims 1 to 47. 
wherein the artificial chromosome is selected from the group comprising a Yeast 

20 Artificial Chromosome, a mega Yeast Artificial Chromosome, a Bacterial Artificial 

Chromosome, a mouse artificial chromosome, a IVIammalian Artificial 
Chromosome, an Insect Artificial Chromosome/an Avian Artificial Chromosome, 
a Bacteriophage Artificial Chromosome, a Baculovirus Artificial Chromosome, or 
a Human Artificial Chromosome. 

25 

49. The artificial di.romosome according to any of the preceding claims 1 to 48, 
wherein the diromosome further comprises at least one selectable genetic 
marker, such as a recessive or a dominant marker. 

30 50. The artificial chromosome according to claim 49, comprising at least two 
selectable genetic mariners. 

51. The artificial chromosome according to any of the preceding claims 49 to 50. 
wherein the at least one marker comprises a marker selected from the group 
35 comprising LEU 2. TRP 1. HIS 3. LYS 2. URA 3, ADE 2, Amyloglucosidase, p- 
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lactamase, CUP 1, G418^ TUN'', KILk1, C230, SMR1, SFA, Hygromycin^ 
methotrexate'^, chloramphenicol^, Diuron^, Zeocin^, Canavanine^. 

52. The artificial chromosome according to any of the preceding claims 1 to 51, 
5 being designed to minimise the level of repeat sequences occurring in the 

concatemer. 

53. A host ceil comprising at least one artificial chromosome comprising at least a 
first and a second expressible nucleotide sequence under the control of a 

10 controllable promoter, the promoter of the first expressible nucleotide sequence 

being controllable Independently from the promoter of the other expressible 
nucleotide sequence. 

54. The host cell according to claim 53, wherein the two different nucleotide 
15 sequences are from the same expression state or from at least two different 

expression states. 

55. The cell according to claim 53, wherein the at least two different expression 
states represent at least two different tissues, such as at least two organs,'such 

20 as at least two species, such as at least two genera. 

56. The cell according to^claim .55, wherein the two different species are from at 
least two different phylae, such as from at least two different classes, such as 
from at least two different divisions, more preferably from at least two different 

25 sub-kingdoms, such as from at least two different kingdoms. 

57. The cell according to claim 55 or 56, wherein one species is a eukaryot and 
another species is a prokaryot. 

30 58. The cell according to any of the preceding claims 53 to 57, comprising at least 
two sub-sets of expressible nucleotide sequences, the expressible nucleotide 
sequences of the first set being under the control of the same controllable 
promoter and the expressible nucleotide sequences of the second sub-set being 
under the control of another controllable promoter. 
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59. The cell according to claim 58, comprising at least three sub-sets of expressible 
' nucleotide sequences, such as at least four sub-sets, for example at least five 

sub-sets, such as at least six sub-sets, for example at least seven sub-sets, 
such as at least eight sub-sets, for example at least nine sub-sets, such as at 
5 least ten sub-sets, for example a 11, 12, 15, 20, 25, 30, 50, 75 or at least 100 

sub-set of expressible nucleotide sequences, each sub-set comprising a unique 
controllable promoter. 

60. The cell according to claim 58 to 59, wherein each sub-set of nucleotide 
10 sequences comprises a random and individual selection of expressible 

nucleotide sequences from the same population of expressible nucleotide 
sequences. 

61 . The ceil according to any of the preceding claims 53 to 60, further comprising at 
15 least one heterologous controllably expressible nucleotide sequences inserted 

into a native chromosome and/or being located on a plasmid and/or a cosmid, 
and/or a phage and/or a virus. 

62. The cell according to any of claims 53 to 61. comprising a prokaryotic cell 
20 selected from the group comprising bacteria such as' Escherichia coli. Bacillus 

subtilis, Streptomyces lividans, Streptomyces coelicolor Pseudomonas 
aeruginosa, Myxococcus xanthus. 

63. The cell according to any of claims 53 to 621, comprising a eukaryotic cell 
25 selected from the group comprising: yeasts; filamentous ascomycetes such as 

Neurospora crassa and Aspergillus nidularis; plant cells such as those derived 
from Nicotiana and Arabidopsis; mammalian host cells such as those derived 
from humans, monkeys and rodents, such as Chinese hamster ovary (CHO) 
cells. NIH/3T3, COS, 293, VERO. HeLa. 

30 

64. The cell according to claim 63, being a yeast cell selected from the group 
comprising Kluyveromyces marxianus, K. lactis, Candida utilis, Phaffia 
rhodozyma, Saccharomyces boulardii, Pichia pastoris, Hansenula polymorpha, 
Yarrowia lipolytica, Candida paraffinica, Schwanniomyces castellil, Pichia 

35 stipitis, Candida shehatae, Rhodotorula glutinis, Lipomyces lipofer. 
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Cryptococcos curvatus, Candida spp. (e.g. C. paimioleophila), Yarro\Ana 
lipolytica, Candida guilliermondii, Candida, Rhodotomla spp., Saccfiaromycopsis 
spp., Aureobasidium pulluians, Candida brumptii, Candida hydrocarbofumarica, 
Tomlopsis, Candida tropicalis, Saccharomyces cerevisiae, Rhodotomla rubra, 
5 Candida flaveri, Eremothecium ashbyii, Pichia spp.; Kluyveromyces, Hansenula, 

Kloeckera, PIchia, Pachysolen spp., or Torulopsis bombicola. 

65. The cell according to any of the preceding claims 53 to 64, having a mutation in 
a central biosynthetic pathway. 

10 

66. The cell according to claim 65, comprising a selectable genetic marker inserted 
on at least one artificial chromosome complementing the mutation. 

67. The cell according to any of the preceding claims 53 to 66, comprising at least 
15 one selectable genetic marker inserted on at least one artificial chromosome. 

68. The cell according to claim 67, comprising at least two selectable genetic 
markers inserted on at least one artificial chromosome. 

20 69. The cell according to any of the preceding claims 53 to 68, wherein each 
artificial chromosome comprises at least one unique selectable genetic marker. 

70. The cell according to claim 69, wherein each artificial chromosome comprises 
two unique selectable markers. 

25 

71 . The cell according to claim 69, wherein all artificial chromosome comprise one 
common selectable marker. 

72. The cell according to any of claims 53 to 69. wherein the nucleotide sequence of 
30 at least one artificial chromosome, preferably the nucleotide sequence from 

substantially all artificial chromosomes have been designed to minimise the level 
of repeat sequences in any one artificial chromosome. 

73. The cell according to any of the preceding claims 53 to 72, wherein 
35 recombination within the expressible nucleotide sequence has been minimised. 
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74. The cell according to any of the preceding dalnns 53 to 73, wherein at least one 
artificial chromosome, preferably substantially all artificial chromosomes is/are 
artificial chromsome/s according to claims 1 to 52. 

5 

75. A host cell comprising at least four artificial chromosomes, wherein the four" 
chromosomes are different 

76. The cell according to claim 75, wherein at least one artificial chromosome 
10 comprises an expressible nucleotide sequence under the control of a 

controllable promoter. 

77. The cell according to any of the preceding claims 75 to 76, further comprising at 
least one heterologous controllably expressible nucleotide sequence inserted 

15 into a native chromosome and/or being located on a plasmid and/or a cosmid, 

and/or a phage and/or a virus. 

78. The cell according to any of claims 75 to 77, comprising a prokaryotic cell 
selected from the group comprising bacteria such as Escherichia coli, Bacillus 

20 subtilis, Streptomyces lividans, Streptomyces coelicolor Pseudomonas 

aeruginosa, Myxococcus xanthus. 

79. The cell according to any of claims 75 to 77, comprising a eukaryotic cell 
selected from the group comprising: yeasts; filamentous ascomycetes such as ( 

25 Neurospora crassa and Aspergillus nidulans; plant cells such as those derived 

from Nicotiana and Arabidopsis; mammalian host cells such as those derived 
from humans, monkeys and rodents, such as Chinese hamster ovary (CHO) 
cells, NIH/3T3, COS, 293, VERO. HeLa. 

30 80. The cell according to claim 79, being a yeast cell selected from the group 
comprising, Kluyverbmyces marxianus, K, lactis, Candida utilis, Phaffia 
rhodozyma, Saccharomyces boulardii, Pichia pastoris, Hansenula polymorpha, 
Yarrowia lipolytica, Candida paraffinica, Schwanniomyces castellii, Pichia 
stipitis, . Candida shehatae, Rhodotorula glutinis, Lipomyces lipofer, 

35 Cryptococcos . curvatus, Candida spp. (e.g. C. palmioleophila), Yarrowia 
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lipolytica, Candida guilliermondii, Candida, Rhodotorula spp., Saccharomycopsis 
spp., Aureobasidium pullulans, Candida brumptii, Candida hydrocarbofumarica, 
Toailopsis, Candida tropicaiis, Saccharomyces cerevisiae, Rhodotomla rubra, 
Candida flaveri, Eremothecium ashbyii, Pichia spp., Kluyveronnyces, Hansenula, 
5 . Kloeckera, Pichia, Pachysolen spp., or Torulopsis bombicola. 

81 . The cell according to any of the preceding claims 75 to 80, having a mutation in 
a central biosynthetic pathway. 

10 82. The cell according to claim 81, comprising a selectable genetic marker inserted 
on an artificial chromosome complementing the mutation. 

83. The cell according to any of the preceding claims 75 to 82, comprising at least 
one selectable genetic marker inserted on at least one artificial chromosome. 

84. The cell according to claim 83, comprising at least two selectable markers 
inserted on at least one artificial chromosome. 

85. The cell according to any of the preceding claims 75 to 84, wherein each 
20 artificial chromosome comprises at least one unique selectable genetic marker. 

86. The cell according^to claim 85, wherein each artificial chromosome comprises at 
least two unique selectable genetic markers. 

25 87. The cell according to claim 85, wherein artificial chromosomes comprise at least 
one common selectable genetic marker. 

88. The cell according to any of claims 75. to 85, wherein the nucleotide sequence of 
at least one artificial chromosome, preferably the nucleotide sequence from 

30 substantially all artificial chromosomes have been designed to minimise the level 

of repeat sequences in any one artificial chromosome. 

89. The cell according to any of the preceding claims 75 to 88, wherein 
recombination within the expressible nucleotide sequence has been minimised. 

35 
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90. The cell according to any of the preceding claims 75 to 89, wherein at least one 
artificial chromosome, preferably substantially all artificial chromosomes is/are 
artificial chromsome/s according to claims 1 to 52. 
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Fig. 3 
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EVE4 entry vector 



Srfl Ascl . 



ADH1 




Spacer2 



"^ColE1 
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Fig. 6 



EVES entry vector 



Srfl 



AmpH 
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Fig. 7 



pYAC4-AscI 

Vector for providing EVACS arms 



Asc I 



ARS1 



TRP1 



Amp 




URA3 



PMB1 



BanSil 



HISS 
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Fig. 11 
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<110> Evolva Biotech AS 
Goldsmith, Neil 
S or ens en, Alexandra M. P. 
Nielsen, Saren V.S. ■ 

<120> Artificial chromosomes comprising concatemers of expressible 
nucleotide sequences 



<130> 


P 503 PCOO 


<150> 


■DK PA 2001 00130 


<151> 


2001-01-25 


<150> 


US 60/300,865 


<151> 


2001.-06-27 


<160> 


4 


<170> 


Patent In version 3 . 1 


<210> 


1 


<211> 


3417 


<212> 


DNA 


<213> 


Synthetic 


<220> 




<221> 


misc feature 


<222> 


(1902) . . (2759) 


<223> 


Ampicillin resistaince 



<220> 

<221> rep^origin 

<222> (959) . • (1899) 

<223> CblEl 



<220> 

<221> misc_feature 

<222> (2891) . . (3347) 

<223> fl-phage origin of replication 



<220> 

<221> terminator 

<222> (495) . . (823) 

<223> ADHl 



<220> 

<221> promoter 

<222> (49) . (437) 

<223> Met 2 5 promoter 



<400> 1 

ctgatttgcc cgggcagttc aggctcatca ggcgcgccat gcagggattc ttcggatgca 60 
agggttcgaa tcccttagct ctcattattt tttgcttttt ctcttgaggt cacatgatcg 120 
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2 

caaaatggca aatggcacgt gaagctgtcg atattgggga actgtggtgg ttggcaaatg 180 

actaattaag ttagtcaagg cgccatcctc atgaaaactg tgtaacataa taaccgaagt 24 0 

gtcgaaaagg tggcaccttg tccaafctgaa cacgctcgat gaaaaaaata agatatatat 300 

aaggttaagt aaagcgtctg ttagaaagga agtttttcct ttttcttgct ctcttgtctt 360 

ttcatctact atttccttcg tgtaatacag ggtcgtcaga tacatagata caattctatt 420 

acccccatcc atacaagctt ggcgccgaat tcgtcgaccc ggggatccgc ggccgcaggc 480 

ctaaattgat .ctagagcttt ggacttcttc gccagaggtt tggtcaagtc tccaatcaag 54 0 

gttgtcggct tgtctacctt gccagaaatt tjacgaaaaga tggaaaaggg tcaaatcgtt 600 

ggtagatacg ttgttgacac ttctaaataa- gcgaatttct tatgatttafc gatttttatt 660* 

attaaataag ttataaaaaa aataagtgta tacaaatttt aaagtgactc ttaggtttta 720 

aaacgaaaat t-cttgttctt gagtaactct ttcctgtagg tcaggttgct ttctcaggta 780 

tagcatgagg tcgctcttat tgaccacacc tctaccggca tgcccatggg ttaactgatc 840 

aatgcatcct gcatggcgcg cctgatgagc ctgaactgcc cgggcaaatc agctggacgt 900 

ctgcctgcat taatgaatcg gccaacgcgc ggggagaggc ggtttgcgta ttgggcgctc 960 

ttccgcttcc tcgctcactg actcgctgcg ctcggtcgtt cggctgcggc gagcggtatc 1020 

agctcactca aaggcggtaa tacggttatc cacagaatca ggggataacg caggaaagaa 1080 

catgtgagca aaaggccagc aaaaggccag gaaccgtaaa aaggccgcgt tgctggcgtt 114 0 

tttccatagg ctccgccccc ctgacgagca tcacaaaaat cgacgctcaa gtcagaggtg 12 00 

gcgaaacccg acaggactat. aaagatacca ggcgtttccc cctggaagct ccctcgtgcg 1260 

ctctcctgtt ccgaccctgc cgcttaccgg atacctgtcc gcctttctcc cttcgggaag 132 0 

cgtggcgctt tctcatagct cacgcfcgtag gtatctcagt tcggtgtagg tcgttcgctc 13 80 

caagctgggc. tgtgtgcacg aaccccccgt tcagcccgac cgctgcgcct tatccggtaa 1440 

ctatcgtctt gagtccaacc cggtaagaca cgacttatcg ccactggcag cagccactgg 15 oo 

taacaggatt agcagagcga ggtatgtagg cggtgctaca gagttcttga agtggtggcc 1560 

taactacggc tacactagaa ggacagtatt tggtatctgc gctctgctga agccagttac 1620 

cttcggaaaa agagttggta gctcttgatc cggcaaacaa accaccgctg gtagcggtgg 16BO 

tttttttgtt tgcaagcagc agattacgcg. cagaaaaaaa ggatctcaag aagatccttt 1740 

gatcttttct acggggtctg acgctcagtg gaacgaaaac tcacgttaag ggattttggt 18OO 

catgagatta tcaaaaagga tcttcaccta gatcctttta aattaaaaat gaagttttaa i860 

atcaatctaa agtatatatg agtaaacttg gtctgacagt taccaatgct taatcagtga 1920 

ggcacctatc tcagcgatct gtctatttcg ttcatccata gttgcctgac tccccgtcgt 1930 



wo 02/059330 PCT/DK02/00058 



gtagataact 


acgatacggg 


agggcttacc 


atctggcccc 


agtgctgcaa 


tgataccgcg 


2040 


agacccacgc 


tcaccggctc 


cagatttatc 


agcaataaac 


cagccagccg 


gaagggccga 


2100 


gcgcagaagt 


ggtcctgcaa 


ctttatccgc 


ctccatccag 


tctattaatt 


gttgccggga 


2166. 


agctagagta 


agtagttcgc 


cagttaatag 


tttgcgcaac 


gttgttgcca 


ttgctacagg 


2220 


catcgtggtg 


tcacgctcgt 


cgtttggtat 


ggcttcattc 


agctccggtt 


cccaacgatc 


2280 


aaggcgagtt 


acatgatccc 


ccatgttgtg 


caaaaaagcg 


gttagctcct 


tcggtcctcc 


2340 


gatcgttgtc 


agaagtaagt: 


tggccgcagt 


gttatcactc 


atggttatgg 


cagcactgca 


2400 


taattctctt 


actgtcatgc 


catccgtaag 


atgcttttct 


gtgactggtg 


agtactcaac 


2460 


caagtcattc 


tgagaatagt 


gtatgcggcg 


accgagttgc 


tcttgcccgg 


cgtcaatacg 


2520 


ggataatacc 


gcgccacata gcagaacttt 


aaaagtgctc 


atcattggaa 


aacgttcttc 


2580 


ggggcgaaaa 


ctctcaagga 


tcttaccgct 


gttgagatcc 


agttcgatgt 


aacccactcg 


2640 


tgcacccaac 


tgatcttcag 


catcttttac 


tttcaccagc 


gtttctgggt 


gagcaaaaac 


2700 


aggaaggcaa 


aatgccgcaa 


aaaagggaat 


aagggcgaca 


cggaaatgtt 


gaatactcat: 


2760 


actcttcctt 


tttcaatatt 


attgaagcat: 


ttatcagggt 


tattgtctca 


tgagcggata 


2820 


catatttgaa 


tgtatttaga 


aaaataaaca 


aataggggtt 


ccgcgcacat 


ttccccgaaa 


2880 


agtgccacct 


gacgcgccct 


gtagcggcgc 


attaagcgcg 


gcgggtgtgg 


tggttacgcg 


2940 


cagcgtgacc 


gctacacttg 


ccagcgccct 


agcgcccgct 


cctttcgctt 


tcttcccttc 


3000 


ctttctcgcc 


acgttcgccg 


gctttccccg 


tcaagctcta 


aatcgggggc 


tccctttagg 


3060 


gttccgattt 


agtgctttac 


ggcacctcga 


ccccaaaaaa 


cbtgattagg 


gtgatggttc 


3120 


acgtagtggg 


ccatcgccct 


gatagacggt 


ttttcgccct 


ttgacgttgg 


agtccacgtt 


3180 


ctttaatagt 


ggactcttgt 


tccaaactgg 


aacaacactc 


aaccctatct 


cggtctattc 


3240 


ttttgattta 


taagggattt 


tgccgatttc 


ggcctattgg 


ttaaaaaatg 


agctgattta 


3300 


acaaaaatitzt 


aacgcgaatt 


ttaacaaaat 


attaacgctt 


acaatttcca 


ttcgccattc 


3360 


aggctgcgca 


actgttggga 


agggcgatcg 


gtgcgggcct 


cttcgctatt 


acgccag 


3417 



<210> 2 

<211> 3501 

<212> DNA 

<213> Synthetic 

<220> 

<221> misc_f eature 

<222> (1986) . . (2843) 

<223> Ampicillin resi&tance gene 



wo 02/059330 



4 



PCT/DK02/00058 



<220> 

<221> rep_origin 

<222> (1043) . . (1983) 

<223> ColEl 



<220> 

<221> misc_f eature 

<222> (2975) . . (3^3-1) 

<223> fl-phage origin of replication- 



.<220> 

<221> terminator 

<222> (579) . . (907) 

<223> ADHl 



<220> 

<221> promoter 

<222> (49) . . (519) 

<223> Cupl promoter 



<400> 2 
ctgatttgcc 


cgggcagttc 


aggctcatca 


ggcgcgccat 


gcagggataa 


gccgat ccca 


60 


ttaccgacat 


ttgggcgcta 


tacgtgcata 


tgttcatgta tgtatctgta 


tttaaaacac 


. 12 0 


ttttgtatta 


tttttcctca 


b>n ^9 y l»Cl 


taggtttata 


cggatgattt 


aat^accacc 


180 


tcaccaccct 


ttatttcagg 


ctgatatctt. 


agccttgtta 


ctagttagaa 


aaagacattt 


240 


ttgctgtcag 


tcactgtcaa 


gagattcttt 


tgctggcatt 


tcttctagaa 


gcaaaaagag 


300 


cgatgcgtct 


ttt.ccgctga 


accgttccag 


caaaaaagac 


taccaacgca 


atatggattg 


360 


tcagaatcat 


ataaaagaga 


agcaaataac 


tccttgtctt 


gtatcaattg 


cattataata 


420 


tcttcttgtt 


agtgcaatat 


catatagaag 


tcatcgaaat 


agatattaag 


aaaaacaaac 


480 


tgtacaatca 


atcaatcaat 


catcacat'aa 


aatgttcaaa 


gcttggcgcc 


gaattcgtcg 


540 


acccggggat 


ccgcggccgc 


aggcctaaat 


tgatctagag ctttggactt 


cttcgccaga 


600 


ggtttggtca 


agtctccaat 


caaggttgtc 


ggcttgtcta 


ccttgccaga 


aatttacgaa 


660 


aagatggaaa 


agggtcaaat 


cgttggtaga 


tacgttgttg 


acacttctaa 


ataagcgaat 


720 


ttcttatgat 


ttatgatttt 


tattattaaa 


taagttataa 


aaaaaataag 


tgtatacaaa 


780 


ttttaaagtg 


actcttaggt 


tttaaaacga 


aaattcttgt 


tcttgagtaa 


ctctttcctg 


840 


taggtcaggt 


tgctttctca 


ggtatagcat 


gaggtcgctc 


ttattgacca 


cacctctacc 


900 


ggcatgccca 


tgggttaact 


gatcaatgca 


tcctgcatgg 


cgcgcctgat 


gagcctgaac 


960 


tgcccgggca 


aatcagctgg 


acgtctgcct 


gcattaatga atcggccaac 


gcgcggggag 


1020 


aggcggtttg 


cgtattgggc 


gctcttccgc 


ttcctcgctc 


actgactcgc 


tgcgctcggt 


1080 
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cgttcggctg 


cggcgagcgg 


tatcagctca 


5 

ctcaaaggcg 


gtaatacggt 


tatccacaga 


1140 


atcaggggat 


aacgcaggaa 


agaacatgtg 


agcaaaaggc 


cagcaaaagg 


ccaggaaccg 


1200 


taaaaaggcc 


gcgttgctgg 


cgtttttcca 


taggctccgc 


ccccctgacg 


agcatcacaa 


1260 


aaatcgacgc 


tcaagtcaga 


ggtggcgaaa 


cccgacagga 


ctataaagat 


accaggcgtt 


1320 


tccccctgga 


agctcGCtcg 


tgcgctctcc 


tgttccgacc 


ctgccgctta 


ccggatacct 


1380 


gtccgccttt 


ctcccttcgg 


gaagcgtggc 


gctttctcat 


agctcacgct 


gtaggtatct 


1440 


cagttcggtg 


taggtcgttc 


gctccaagct 


gggctgtgtg 


cacgaacccc 


ccgttcagcc 


1500 


cgaccgctgc 


gccttatccg 


gtaactatcg 


tcttgagtcc 


aacccggtaa gacacgactt 


1560 


atcgccactg 


gcagcagcca 


ctggtaacag 


gat-tagcaga 


gcgaggtatg 


taggcggtgc 


1620 


tacagagttc ttgaagtggt iggcctaacta cggctacact 


agaaggacag 


tatttggtat 


1680 


ctgcgctctg 


ctgaagccag 


ttaccttcgg 


aaaaagagtt 


ggtagctctt 


gatccggcaa 


1740 


acaaaccacc 


gctggtagcg 


gtggtttttt 


tgtttgcaag 


cagcagatta 


cgcgcagaaa 


1800 


aaaaggatct 


caagaagatc 


ctttgatctt 


ttctacgggg 


tctgacgctc 


agtggaacga 


1860 


aaactcacgt 


taagggattt 


tggtcatgag 


attatcaaaa 


aggatcttca 


cctagatcct 


1920 


tttaaattaa 


aaatgaagtt 


ttaaatcaat 


ctaaagtata 


tatgagtaaa 


cttggtctga 


1980 


cagttaccaa 


tgcttaatca 


gtgaggcacc 


tatctcagcg 


atctgtctat 


ttcgttcatc 


2040 


catagttgcc 


tgactccccg 


tcgtgtagat 


aactacgatia 


cgggagggct 


taccatctgg 


2100 


ccccagtgct 


gcaatgatac 


cgcgagaccc 


acgctcaccg 


gctccagatt 


tatcagcaat 


2160 


aaaccagcca 


gccggaaggg 


ccgagcgcag 


aagtggtcct 


gcaactttat 


ccgcctccat 


2220 


ccagtctatt 


aattgttgcc 


gggaagcliag 


agtaagtagt 


tcgccagtta 


atagtttgcg 


2280 


caacgttgtt gccattgcta 


caggcatcgt 


ggtgtcacgc 


tcgtcgtttg 


gtatggcttc 


2340 


attcagctcc 


ggttcccaac 


gatcaaggcg 


agttacatga 


tcccccatgt 


tgtgcaaaaa 


2400 


agcggttagc 


tccttcggtc 


ctccgatcgt 


tgtcagaagt 


aagttggccg 


cagtgttatc 


2460 


actcatggtt 


atggcagcac 


tgcataattc 


tcttactgtc 


atgccatccg 


taagatgctt 


2520 


ttctgtgact 


ggtgagtact 


caaccaagtc 


attctgagaa 


tagtgtatgc 


ggcgaccgag 


2580 


ttgctcttgc 


ccggcgtcaa 


tacgggataa 


taccgcgcca 


catagcagaa 


ctttaaaagt 


2640 


gctcatcajtt 


ggaaaacgtt 


-cttcggggcg 


aaaactctca 


aggatcttac 


cgctgttgag 


2700 


atccagttcg 


atgtaaccca 


ctcgtgcacc 


caactgatct 


tcagcatctt 


ttactttcac 


2760 


cagcgtttct 


gggtgagcaa 


aaacaggaag 


gcaaaatgcc 


gcaaaaaagg 


gaataagggc 


2820 


gacacggaaa 


tgttgaatac 


tcatactctt 


cctttttcaa 


tattattgaa 


gcatttatca 


2880 


gggttattgt 


ctcatgagcg 


gatacatatt 


tgaatgtatt 


t:agaaaaata 


aacaaatagg 


2940 



wo 02/059330 PCT/DK02/00058 



ggttccgcgc 


acatttcccc 


gaaaagtgcc 


acctgacgcg 


ccctgtagcg 


gcgcattaag 


3000 


cgcggcgggt 


gtggtggtta 


cgcgcag'cgt 


gaccgctaca 


cttgccagcg 


ccctagcgcc 


3060 


cgctcctttc 


gctttcttcc 


cttcctttct 


cgccacgttc 


gccggctttc 


cccgtcaagc 


3120 


tctaaatcgg 


gggctccctt 


tagggttccg 


atttagtgct 


ttacggcacc 


tcgaccccaa 


3180 


aaaactbgat 


tagggtgatg 


gtt:cacgtag 


tgggccatcg 


ccctgataga 


cggtttttcg 


3240 


ccctttgacg 


ttggagtcca 


cgttctttaa 


tagtggactc 


ttgttccaaa 


ctggaacaac 


3300 


actcaaccct 


atctcggtct 


attcttttga 


tttataaggg 


attttgccga 


tttcggccta 


3360 


ttggttaaaa 


aatgagctga 


tttaacaaaa 


atttaacgcg 


aattttaaca 


aaatattaac 


3420 


gcttacaatt 


tccattcgcc 


attcaggctg 


cgcaactgtt 


gggaagggcg 


atcggtgcgg 


3480 


gcctcttcgc 


tattacgcca 


g 








3501 



<210> 3 

<211> 4188 

<212> DNA 

<213> Synthetic 

<220> 

<221> misc_f eature 

<222> (2673) . . (3530) 

.<223> Ampicillin resistance gene 



<220> 

<221> rep_origin 

<222> (1730) . . (2670) 

<223> ColEl 



<220> 

<221> raisc_f eature 

<222> (3662) . . (4118) 

<223> fl-phage origin of replication 



<220> 

<22i> terminator 

<222> (1027) . . (1355) 

<223> ADHl 



<220> • 

<221> promoter 

<222> (582) . . (969) 

<223> Met25 promoter 

<220> 

<22l> mis cofeature 

<222> (1365) . . (1603) 

<223> ARSl (autonomous replicating sequence) for Yeast replication 
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<220> 

<221> misc_feature 
<222> (49) . - (574) 

<223> lambda spacer DNA (22428-22923) 

<400> 3 



ctgatttgcc 


ccrqcrcaattc 


aggctcatca 


ggcgcgccat 


gcagggattc 


tggaaattgc 


60 


aacgaaggaa 


gaaacctcgt 


tgctggaagc 


ctggaagaag 


tatcgggtgt 


tgctgaaccg 


120 


tgfttgataca 


tcaactgcac 


ctgatattga gtggcctgct gtccctgtta 


tggagtaatc 


180 


gttttgtgat 


atgccgcaga 


aacgttgtat 


gaaataacgt 


tctgcggtta gttagtatat 


240 


tgtaaagctg 


agtattggtt 


tatttggcga 


ttattatctt 


caggagaata 


atggaagttc 


300 


tatgactcaa 


ttgttcatag 


tgtttacatc 


accgccaatt 


gcttttaaga 


ctgaacgcat 


360 


gaaatatggtz 


ttttcgtcat 


gttttgagtc 


tgctgttgat 


atttctaaag 


tcggtttttt 


420 


ttcttcattt 


tctctaacta 


ttttccatga aatacatttt 


tgattattat 


ttgaatcaat 


480 


tccaattacc 


tgaagtcttt 


catctataat 


tggcattgta 


tgtattggtt 


tattggagta 


540 


gatgcttgct 


tttctgagcc 


atagctctga 


tatcagatct 


tcttcggatg 


caagggttcg 


600 


aatcccttag 


ctctcattat 


tttttgcttt 


ttctcttgag 


gtcacatgat 


cgcaaaatgg 


660 


caaatggcac 


gtgaagctgt 


cgatattggg 


gaactgtggt 


ggttggcaaa 


tgactaatta 


720 


agttagtcaa 


ggcgccatcc 


tcatgaaaac 


tgtgtaacat 


aataaccgaa 


gtgtcgaaaa 


780 


ggtggcacct 


tgtccaattg 


aacacgctcg 


atgaaaaaaa 


taagatatat 


ataagc^taa 


840 


gtaaagcgtc tgttagaaag gaagtttttc ctttttcttg ctctcttgtc 


ttttcatcta 


900 


ctatt.t-cc.tt 


cgtgtaatac 


agggtcgtca 


gatacataga 


tacaattcta 


ttacccccat 


. 960 


ccatacaagc 


ttggcgccga 


.attcgtcgac 


ccggggatcc 


gcggccgcag 


gcctaaattg 


1020 


atctagagct 


ttggacttct 


tcgccagagg 


tttggtcaag 


tctccaatca 


aggttgtcgg 


1080 


cttgtctacc 


ttgccagaaa 


tttacgaaaa gatggaaaag ggtcaaatcg ttggtagata 


1140 


cgttgttgac 


acttctaaat 


aagcgaattt 


cttatgattt 


atgattttta 


ttattaaata 


1200 


agttataaaa 


aaaataagtg 


tatacaaatt 


ttaaagtgac 


tcttaggttt 


taaaacgaaa 


1260 


attcttgttc 


ttgagtaact 


ctttcctgta 


ggtcaggttg 


ctttctcagg 


tatagcatga 


1320 


ggtcgctctt 


attgaccaca 


cctctaccgg 


catgcccatg ggttcttttg 


aaaagcaagc 


1-380 


ataaaagatc 


taaacataaa 


atctgtaaaa 


taacaagatg 


taaagataat 


gctaaatcat 


1440 


ttggcttttt 


gattgattgt 


acaggaaaat 


atacatcgca 


gggggttgac 


ttttaccatt 


1500 


tcaccgcaat 


ggaatcaaac 


ttgttgaaga 


gaatgttcac 


aggcgcatac gctacaatga 


1560 


cccgattctt 


gctagccttt 


tctcggtctt 


gcaaacaacc 


gccaactgat 


caatgcatcc 


1620 


tgcatggcgc 


gcctgatgag 


cctgaactgc 


ccgggcaaat 


cagctggacg 


tctgcctgca 


1680 
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ttaatgaatc ggccaacgcg cggggagagg cggtttgcgt attgggcgct cttccgcttc 1740 
ctcgctcact gactcgctgc gctcggtcgt tcggctgcgg cgagcggtat cagctcactc 1800 
aaaggcggta atacggttat ccacagaatc aggggataac gcaggaaaga acatgtgagc 1860 
aaaaggccag caaaaggcca ggaaccgtaa aaaggccgcg ttgctggcgt ttttccatag 192 0 
gctccgcccc cctgacgagc atcacaaaaa tcgacgctca agtcagaggt ggcgaaaccc 1980 

gacaggacta taaagatacc aggcgtttcc ccctggaagc tccctcgtgc gctctcctgt 2 040* 

tccgaccctg ccgcttaccg gatacctgtc cgcctttctc ccttcgggaa gcgtggcgct 2100 

ttctcatagc tcacgctgta ggtatctcag ttcggtgtag gtcgttcgct ccaagctggg 2160 

ctgtgtgcac gaaccccccg ttcagcccga ccgctgcgcc ttatccggta actatcgtct 222 0 

tgagtccaac ccggtaagac acgacttatc gccactggca gcagccactg gtaacaggat 2280 

tagcagagcg aggtatgtag gcggtgctac agagttcttg aagtggtggc ctaactacgg 2340 

ctacactaga aggacagtat ttggtatctg cgctctgctg aagccagtta ccttcggaaa 2400 

aagagttggt agctcttgat ccggcaaaca aaccaccgct ggtagcggtg gtttttttgt 2460 

ttgcaagcag cagattacgc gcagaaaaaa aggatctcaa gaagatcctt tgatcttttc 252 0 

tacggggtct gacgctcagt ggaacgaaaa ctcacgttaa gggattttgg tcatgagatt 2580 

atcaaaaagg atcttcacct agatcctttt aaattaaaaa tgaagtttta aatcaatcta 2640 

aagtatatat gagtaaactt . ggtctgacag. ttaccaatgc ttaatcagtg aggcacctat 2700 

ctcagcgatc tgtctatttc gttcatccat agttgcctga ctccccgtcg tgtagataac 2760 

tacgatacgg gagggcttac catctggccc cagtgctgca atgataccgc gagacccacg 2820 

ctcaccggct ccagatttat cagcaataaa ccagccagcc ggaagggccg agcgcagaag 2880 

tggtcctgca actttatccg cctccatcca gtctattaat . tgttgccggg aagctagagt • 2940 

aagtagttcg .ccagttaata gtttgcgcaa cgttgttgcc attgctacag gcatcgtggt 3000 

gtcacgctcg tcgtttggta tggcttcatt cagctccggt tcccaacgat caaggcgagt 3060 

tacatgatcc cccatgttgt gcaaaaaagc ggttagctcc ttcggtcctc cgatcgttgt 3120 

cagaagtaag ttggccgcag tgttatcact catggttatg gcagcactgc ataattctct 3180 

tactgtcatg ccatccgtaa gatgcttttc tgtgactggt gagtactcaa ccaagtcatt 3240 

ctgagaatag tgtatgcggc gaccgagttg ctcttgcccg gcgtcaatac ggga.taatac 3300 

cgcgccacat agcagaactt taaaagtgct catcattgga aaacgttctt cggggcgaaa- 3360 

actctcaagg atcttaccgc tgttgagatc cagttcgatg taacccactc gtgcacccaa 3420 

ctgatcttca gcatctttta ctttcaccag cgtttctggg tgagcaaaaa caggaaggca 34 80 

aaatgccgca aaaaagggaa taagggcgac acggaaatgt tgaatactca tactcttcct 3540 
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ttttcaatat 


tattgaagca 


tttatcaggg 


ttattatctc 




a c a t a ti iitci a 


3600 


atgtatttag 


aaaaataaac 


aaat:aggggt: 


tccgcgcaca 


tlitccccgaa 


aaQtcfccacc 


3660 


tgacgcgccc 


tgtagcggcg 


cattaagcgc 


ggcgggtgtg 


gtggttacgc 


ofcacrccrtaac 


3720 


pgctacactt 


gccagcgccc 


tagcgcccgc 


tcctttcgct 


ttcttccctt 


cctttctccrc 


3780 


cacgttcgcc 


ggctttcccc 


gtcaagctct 


aaatcggggg 


ctccctttag 


acr fc t* r* r« "H" 


^ 0*±\J 


tagtgcttta 


cggcacctcg 


accccaaaaa 


acttgattag ggtgatggtt 


^^^y i^cty ^g 3 




gccatcgccc 


tgatagacgg 


tttttcgccc 


tttgacgttg gagtccacgt 


ti c 1 1: t aat aor 


3960 


tggactcttg . 


. ttccaaactg 


gaacaacact 


caaccctatc 


tcggtctatt 


cttttgattt 


4020 


ataagggatt 


ttgccgattt 


cggcctattg 


gttaaaaaat 


gagctgattt 


aacaaaaatt 


4080 


taacgcgaat 


tiztaacaaaa 


tattaacgct 


tacaatttcc 


attcgccatt 


caggctgcgc 


4140 


aactgttggg 


aagggcgatc 


ggtgcgggcc 


tcttcgctat 


t.acgccag 




4188 



<210> 4 

<211> 11466 

<212> DNA 

<213> Synthetic 

<220> 

<221> mis cofeature 

<222> (3560) . . (4247) 

<223> Tetrahymena thearmophila macronuclear telomere 



<220> 

<221> . misc_f eature 

<222> (6024) . . (6711) 

<223> Tetrahymena thermophila macronuclear telomere 



<220> 

<221> misc_feature 

<222> (9644) , . (10388) 

<223> Autonomous replicating sec[uence 



<220> 

<221> misc_feature 

<222> (10488) . . (11465) 

<223> Centromere IV 



<220> 

<221> rep_origin 

<222> (7198) . . (7198) 

<223> Origin of replication, PMBl 

<220> 

<221> misc_feature 

<222> (1962) . . (2765) 

<223> URA3, orotidine- 5 » -phosphate decarboxylase coding sequence 
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<220> 

<221> mis c_f e a tur e 

<222> (4893) . . (5552) 

<223> HIS3, imidaaoleglycerolphosphate dehydratase, coding sequence 
<220> 

<221> misc_f eature 

<222> (7956) . . (8816) 

<223> AP{R), beta-lactamase, ampR ampicillin resistance, coding sequenc 



<220> 

<221> misc^f eature 

<222> (9129) . . (9803) 

<223> TRPl, phosphoribosylanthranilate isomerase, coding sequence 



<400> 4 
ttctcatgtt 


tgacagctta 


tcatcgataa 


gctttaatgc 




o u 


ttgctaacgc 


agtcaggcac 


cQtQtatQaa 


atchaacdat 






caccgtcacc 


ctggatgctg 


taggcatacrGr 


cttacrttata 




ion 


gcgggatatc 


gtccattccg 


acagcatcgc 


G aatcac fcat* 


yy*-y ^y*-.\-.yc uaycgc^aua 




tgpgttgatg 


caatttctat 


gcgcacccgt 


tc tccrcracfca 


w ^y ov..\^ y ^ 1^ t» L.yy \^ v^u 




ccgcccagtc 


ctgctcgctt 


cgctacttgq 


agccactatc 


3"-*- ^"V^y Vi»y w u^dcyy u.y dc 


^ o u 


w Vrf ^ V* 


cy i,.y ^dcCa 


at. ucccc uca 


gtataaattt 


cactctgaac catcttggaa 


420 


ggaccggtaa 


ttatttcaaa 


tctctttttc 


aattgtatat 


gtgttatgtt atgtagtata 


480 


ctctttcttc 


aacaattaaa 


tactctcggt 


agccaagttg 


.gtttaaggcg caagacttta 


540 


atttatcact 


acggaattgg 


cgcgccaatt 


ccgtaatctt 


gagatcgggc gttcgatcgc 


600 


cccgggagat 


ttttttgttt 


tttatgtctt 


ccattcactt 


cccagacttg caagttgaaa 


660 


tatttctttc 


aagggaattg 


atcctctacg 


ccggacgcat 


cgtggccggc atcaccggcg 


720 


ccacaggtgc 


Sgttgctggc 


gcctatatcg 


ccgacatcac 


cgatggggaa gatcgggctc 


780 


gccacttcgg 


gctcatgagc 


gcttgtttcg 


gcgtgggtat 


ggtggcaggc cccgtggccg 


840 


ggggactgtt 


gggcgccatc 


tccttgcatg 


caccattcct 


tgcggcggcg gtgctcaacg 


900 


gcctcaacct 


actactgggc 


tgcttcctaa 


tgcaggagtc 


gcataaggga gagcgtcgac 


960 


cgatgccctt 


gagagccttc 


aacccagtqa 


gctccttccg 


gtgggcgcgg ggcatgacta 


1020 


tcgtcgccgc 


acttatgact 


gtcttcttta 


tcatgcaact 


cgtaggacag gtgccggcag 


1080 


cgctctgggt 


cattttcggc 


gaggaccgct 


ttcgctggag 


cgcgacgatg atcggcctgt 


1140 


cgcttgcggt 


attcggaatc 


ttgcacgccc 


tcgctcaagc 


cttcgtcact ggtcccgcca 


1200 
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t cgccggcat 


53 Ofri" <^ i~ ^ <~f 


ygcgu ucyuy 


cii^y ik^y dy y L. 


ggatggccuu 




CdC'Cyyydu.y 


Urww>y ^y I. L.y c> 


^ggccacgct 


a r-« a 4- ainnn 
a^-t^a cty 


a a ^ /— »+- 4" a a 
dwdy Cl^uCdd 


yy ducy c b>cy 


cggctCT. uac 


y ctv-. u» y c y CL t_ 


^y L»i.«d^yy (,»y 


a+* i" a ^ f^rr 
d L. L.dt»y (^^y 


cctcggcgag 


yy«^ u»y ua.yy 


u<yi^v^yVii<i,«wv«d 


a r*f+" t"rT+" r«i* 

L.dla'^ Ul«.y L.(^L. 


ycctccccgc 




c ucy dcu uy d 


a+"Of^^a a/^/^/^<^ 

d uyy day ccg 


gcggcacctc 


s A rr> ' 4~ 4™ Q 


gccaaucaau 


tcttgcggag 


aac t gt gaa t 


^ cL ciL< cl i«. a C C 


ducy Cjy uccg 


ccdCCuccag 


cagccgcacg 


\m i^^CICl <-> L. Q Ct 


t- f- r* A -h r* a i- i- -h 


u U L.^ UdU u 


Cuu u uCuuCg 


1 4- 1 f i- era 1- r» 


yyi^ddu.c uwc 


rra a r^a^a a rif^ 

gaacagaagg 


aagaacgaag 


trrrrt- -a i" "hal" 


dv^yL^dL<dL.y w 


a r^t" /^+* 4- rra arY 

dy uy t.L.yddy 


aaacacgaaa 


ClCLv> L.^^Cl^Oi^ 


ddCdddddCw 


cgcaggaaac 


gaagataaat 




uy C^dCUCau 


,^1 4** S 1 ■ 4» 

ccuay uccug 


titigctgccaa 


a a a a rtr^ a a a 
cLciclay dctaC 


ddd C u uy u y u 


4~ ^ A 4* 4'»<-f !■< 

y c u u cat ^gg 


atgtt:cgtiac 


L»cty u ^3 cloy c 


di. uayy uccc 


BASS ^ 4** 4— J i<4» ^ 

aaaa^ucg^ u 


tactaaaaac 


a 1~ ^ a 


ggagggcaca 


gttaagccgc 


taaaggcatt 


uo-C u.Ci^ uCy a. 


agacagaaaa 


ttfcgctgaca 


ttggtaatac 




cagaatagca 


gaat:gggcag 


acat'tacgaa 


C a.y y u a, u l.y u 


cagcyy uuug 


aag c aggcgg 


cagaagaag^ 


t- f *- 1" 1- al- 1- 
L>i«L»L.ycfcc>yc>i. 


dycdydduuy 


ucai^gcaagg 


gctccctatc 


y u-ciw L. i,.y CL 


^a 4*" ^T^^^f a 

cdi^ t^ycy ddy 


a a a a ^» 

dy cy dwdddy 


•a 4- 4- 4- 4- t^i- 4- ^ 4_ 

auutugctat 


c*^at.yyy ugg 


ddy ay duyaa 


t^^f^ 4*~ n ^•^T ^ 4* 4" 

yy utdCy au u 


ggttgattat 




agacgcattg 


ggtcaacagt 


atagaaccgt 




tattattgtt 


ggaagaggac 


t:a1:t:tgcaaa 




ttacagaaaa 


gcaggctiggg 


aagcat;a1ititi 


~ ai^i'aaaaaaf^ 
ctL* L. del ct cad etc 


'H rt^ a V a t- a a 

uguduiiauaa 


guaaacgca^ 


guacaccaaa 


i-ht-aat-f-at-a 
u i^ddL. i.dL.d 


4— « «— 1 <- 4~ 1 ♦»4" 

i^cay u uduca 


cccgggcgud 


augactuu ua 


ttggaaagaa 


aagggggggg 


gggcagcgtt 


gggtcctggc 


ctcctgtcgt 


tgaggacccg 


gctaggctgg 


cggggttgcc 


tcaccgatac 


gcgagcgaac 


gtgaagcgac 


tgctgctgca 
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ggcggccgac 


gcgctgggct 


1260 


ccccattatg 


attcttctcg 


1320 


gtccaggcag 


gtagatgacg 


1380 


cagcctaact 


tcgatcactg 


1440 


cacatggaac 


gggttggcat 


1500 


gttgcgtcgc 


ggtgcatgga 


1560 


gctaacggat 


tcaccactcc 


1620 


gcgcaaacca 


acccttggca 


1680 


cggcgcatcc 


ccccccccct 


1740 


atttcggttt 


ctttgaaatt 


1800 


gaaggagcac 


agacttagat 


1860 


ttgcccagta 


ttcttaaccc 


1920 


catgtcgaaa 


gctacatata 


1980 


gctatttaat 


atcatgcacg 


2040 


caccaaggaa 


ttactggagt 


2100 


acatgtggat 


atcttgactg 


2160 


atccgccaag 


tacaattttt 


2220 


agtcaaattg 


cagtactctg 


2280 


tgcacacggt 


gtggtgggcc 


2340 


aacaaaggaa 


cctagaggcc 


2400 


tactggagaa 


tatactaagg 


2460 


cggctttatt 


gctcaaagag 


2520 


gacacccggt 


gtgggtttag 


2580 


ggatgatgtg 


gtctctacag 


2640 


gggaagggat 


gctaaggtag 


2700 


gagaagatgc 


ggccagcaaa 


2760 


ctcacaaatt 


agagcttcaa 


2820 


taatgacgaa 


aaaaaaaaaa 


2880 


cacgggtgcg 


catgatcgtg 


2940 


ttactggtta 


gcagaatgaa 


3000 


aaacgtctgc 


gacctgagca 


3060 



wo 02/059330 PCT/DK02/00058 

12 

acaacatgaa tggtcttcgg tttccgtgtt tcgtaaagtc tggaaacgcg gaagtcagcg 3120 
ccctgcacca ttatgttccg gatctgcatc gcaggatgct gctggctacc ctgtggaaca 3180 
cctacatctg tattaacgaa gcgctggcat tgaccctgag tgatttttct ctggtcccgc 3240 
cgcatccata ccgccagttg tttaccctca caacgttcca gtaaccgggc atgttcatca 3300 
tcagtaaccc gtatogtgag catcctctct cgtttcatcg gtatcattac ccccatgaac 3360 
agaaattccc ccttacacgg aggcatcaag tgaccaaaca ggaaaaaacc gcccttaaca 3420 
tggcccgctt- tatcagaagc cagacattaa cgcttctgga gaaactcaac gagctggacg 3480 
cggatgaaca ggcagacatc tgtgaatcgc ttcacgacca cgctgatgag ctttaccgca 3540 
gccctcgagg gataagcttc atttttagat aaaatttatt aatcatcatt aatttcttga 3600 
aaaacatttt atttattgat cttttataac aaaaaaccct tctaaaagtt tatfctttgaa 3660 
tgaaaaactt ataaaaattt atgaaaacta caaaaaataa aatttttaat taaaataatt 3720 
ttgataagaa cttcaatctt tgactagcta gcttagtcat ttttgagatt taattaatat 3780 
tttatgttta ttcatatata aactattcaa aatattatag aatttaaaca ttttaacatc 3840 

ttaatcattc ataaataact aaaaatcaaa gtattacatc aataaataac ttttactcaa 3900 

tgtcaaagaa ttattggggt tggggttggg gttggggttg gggttggggt tggggttggg 3960 

gttggggttg gggttggggt tggggttggg gttggggttg gggttggggt tggggttggg 4020 

gttggggttg gggttggggt tggggttggg gttggggttg gggttggggt tggggttggg 4080 

gttggggttg gggttggggt tggggttggg gttggggttg gggttggggt tggggttggg 4140 

gttggggttg gggttggggt tggggttggg gttggggttg gggtgggaaa acagcattca 4200 

ggtattagaa gaatatcctg attcaggtga aaatattgtt gatgcgcggg atcctcgggg 4260 

acaccaaata tggcgatctc ggccttttcg tttcttggag ctgggacatg tttgccatcg 4320 

atccatctac caccagaacg .gccgttagat ctgctgccac cgttgtttcc accgaagaaa 4380 

ccaccgttgc cgtaaccacc acgacggttg ttgctaaaga agctgccacc gccacggcca 4440 

ccgttgtagc cgccgttgtt gttattgtag ttgctcatgt tatttctggc acttcttggt 4500 ' 

tttcctctta agtgaggagg aacataacca ttctcgttgt tgtcgttgat gcttaaattt 4560 

tgcacttgtt cgctcagttc agccataata tgaaatgctt ttcttgttgt tcttacggaa 4620 

taccacttgc cacctatcac cacaactaac tttttcccgt tcctccatct cttttatatt 4680 

ttttttctcg atcgagttca agagaaaaaa aaagaaaaag caaaaagaaa aaaggaaagc 4740 

gcgcctcgtt cagaatgaca cgtatagaat gatgcattac cttgtcatct tcagtatcat 4800 

actgttcgta tacatactta ctgacattca taggtataca tatatacaca tgtatatata 4860 

tcgtatgctg cagctttaaa taatcggtgt cactacataa gaacaccttt ggtggaggga 4920 
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acatcgttgg taccattggg cgaggtggct tctcttatgg caaccgcaag agccttgaac 4980 

gcactctcac tacggtgatg atcattcttg cctcgcagac aatcaacgtg gagggtaatt 5040 

ctgctagcct ctgcaaagct ttcaagaaaa tgcgggatca tctcgcaaga gagatctcct 5100 

actttctccc tttgcaaacc aagttcgaca actgcgtacg gcctgttcga aagatctacc 5160 

accgctctgg aaagtgcctc atccaaaggc gcaaatcctg atccaaacct ttttaetcca 5220 

cgcgccagta gggcctcttt aaaagcttga ccgagagcaa tcccgcagtc ttcagtggtg 5280 

tgatggtcgt ctatgtgtaa gtcaccaatg cactcaacga ttagcgacca gccggaatgc 534 0 

ttggccagag catgtatcat atggtccaga aaccctatac ctgtgtggac gttaatcact 5400 

tgcgattgtg tggcctgttc tgctactgct tctgcctctt tttctgggaa gatcgagtgc 5460 

tctatcgcta ggggaccacc ctttaaagag atcgcaatct gaatcttggt ttcatttgta 5520 

atacgcttta ctagggcttt ctgctctgtc atctttgcct tcgtttatct tgcctgctca 5580 

ttttttagta tattcttcga agaaatcaca ttaetttata taatgtataa ttcattatgt 5640 

gataatgcca atcgctaaga aaaaaaaaga gtcatccgct aggtggaaaa aaaaaaatga 5700. 

aaatcattac cgaggcataa aaaaatajtag agtgtactag aggaggccaa gagtaataga 5760 

aaaagaaaat tgcgggaaag gactgtgtta tgacttccct gactaatgcc gtgttcaaac 5820 

gatacctggc agtgactcct agcgctcacc aagctcttaa aacgagaatt aagaaaaagt 5880 

cgtcatcttt cgataagttt ttcccacagc aaagcaatag tagaaaaaaa caatgggaaa 5940 
' cgttgaatga agacaaagcg tcgtggttta aaaggaaata cgctcacgta catgctaggg 6000 
aacaggaccg tgcagcggat cccgcgcatc aacaatattt tcacctgaat caggatattc 6060 
ttctaatacc tgaatgctgt tttcccaccc caaccccaac cccaacccca accccaaccc 6120 
caaccccaac cccaacccca accccaaccc caaccccaac cccaacccca accccaaccc 6180 
caaccccaac cccaacccca accccaaccc caaccccaac cccaacccca accccaaccc 6240 
caaccccaac cccaacccca accccaaccc caaccccaac cccaacccca accccaaccc 6300 
caaccccaac cccaacccca accccaaccc caaccccaac cccaacccca accccaataa 63 60 
ttctttgaica ttgagtaaaa gttatttatt gatgtaatac tttgattttt agttatttat 6420 
gaatgattaa gatgttaaaa tgtttaaatt ctataatatt ttgaatagtt tatatatgaa 6480 
taaacataaa. atattaatta aatctcaaaa atgactaagc tagctagtca aagattgaag 6540- 
ttcttatcaa aattatttta attaaaaatt ttattttttg tagttttcat aaatttttat 6600 
aagtttttca ttcaaaaata aacttttaga agggtttttt gttataaaag atcaataaat 6 660 
aaaatgtttt tcaagaaatt aatgatgatt aataaatttt atctaaaaat gaagcttatc 6720 
cctcgagggc tgcctcgcgc gtttcggtga tgacggtgaa aacctctgac acatgcagct 67 80 
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cccggagacg gtcacagctt gtctgtaagc ggatgccggg agcagacaag cccgtcaggg 6840 

cgcgtcagcg ggtgttggcg ggtgtcgggg cgcagccatg acccagtcac gtagcgatag 6900 

cggagtgtat actggcttaa ctatgcggca ' tcagagcaga ttgtactgag agtgcaccat 6960 

atgcggtgtg aaataccgca cagatgcgta aggagaaaat accgcatcag gcgctcttcc 7020 

gcttcctcgc tcactgactc gctgcgctcg gtcgttcggc tgcggcgagc ggtatcagct 7080 

cactcaaagg cggtaatacg gttatccaca. gaatcagggg ataacgcagg aaagaacatg 7140 

tgagcaaaag gccagcaaaa ggccaggaac cgtaaaaagg ccgcgttgct ggcgtttttc 7200 

cataggctcc gcccccctga cgagcatcac aaaaatcgac gctcaagtca gaggtggcga 7260 

aacccgacag gactataaag ataccaggcg tttccccctg gaagctccct cgtgcgctct 7320 

cctgttccga ccctgccgct taccggatac ctgtccgcct ttctcccttc gggaagcgtg 7380 

gcgctttctc atagctcacg ctgtaggtat ctcagttcgg tgtaggtcgt tcgctccaag 7440 

ctgggctgtg tgcacgaacc ccccgttcag cccgaccgct gcgccttatc cggtaactat 7500 

cgtcttgagt ccaacccggt aagacacgac ttatcgccac tggcagcagc cactggtaac 7560 

aggattagca gagcgaggta tgtaggcggt gctacagagt tcttgaagtg gtggcctaac 7620 

tacggctaca ctagaaggac agtatttggt atctgcgctc tgctgaagcc agttaccttc 7680 

ggaaaaagag ttggtagctc ttgatccggc aaacaaacca ccgctggtag cggtggtttt 7740 

tttgtttgca agcagcagat tacgcgcaga aaaaaaggat ctcaagaaga tcctttgatc 7800 

ttttctacgg ggtctgacgc tcagtggaac gaaaactcac gttaagggat tttggtcatg 7 860 

agattatcaa aaaggatctt cacctagatc cttttaaatt aaaaatgaag ttttaaatca . 7920 

atctaaagta tatatgagta aacttggtct gacagttacc aatgcttaat cagtgaggca 7980 

cctatctcag cgatctgtct atttcgttca tccatagttg cctgactccc cgtcgtgtag 8040 

ataactacga tacgggaggg cttaccatct ggccccagtg ctgcaatgat accgcgagac 8100 

ccacgctcao cggctccaga tttatcagca ataaaccagc cagccggaag ggccgagcgc 8160 

agaagtgghc ctgcaacttt atccgcctcc atccagtcta ttaattgttg ccgggaagct 8220 

agagtaagta gttcgccagt taatagtttg cgcaacgttg ttgccattgc tgcaggcatc 8280 

gtggtgtcac gctcgtcgtt tggtatggct tcattcagct ccggttccca acgatcaagg* .8340 

cgagttacat gatcccccat gttgtgcaaa aaagcggtta gctccttcgg tcctccgatc 84 00 

gttgtcagaa gtaagttggc cgcagtgtta tcactcatgg ttatggcagc actgcataat 84 60 

tctcttactg tcatgccatc cgtaagatgc ttttctgtga ctggtgagta ctcaaccaag 8520 

tcattctgag aatagtgtat gcggcgaccg agttgctctt gcccggcgtc aacacgggat 8580 

aataccgcgc cacatagcag aactttaaaa ' gtgctcatca ttggaaaacg ttcttcgggg 8640 
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cgaaaactct 


caaggatctt 


accgctgttg 


agatccagtt 


cgatgtaacc 


cactcgtgca 


8700 


cccaactgat 


cttcagcatc 


ttttact'ttc 


accagcgttt 


ctgggtgagc 


aaaaacagga 


8760 


aggcaaaatg 


ccgcaaaaaa 


gggaataagg 


gcgacacgga 


aatgttgaat 


actcatactc 


8820 


ttcctttttc 


aatatrtattg 


aagcatttat 


cagggttatt 


gtctcatgag 


cggatacata 


8880 


tttgaatgta 


ttrtagaaaaa 


taaacaaata 


ggggttccgc 


gcacatttcc 


ccgaaaagtg 


6940 


ccacctgacg 


tctaagaaac 


cattattatc 


atgacattaa 


cctataaaaa 


taggcgtatc 


9000 


acgaggccct 


ttcgtcttca 


agaattaatt 


cggtcgaaaa 


aagaaaagga 


gagggccaag 


9060 


agggagggca 


ttggtgacta 


ttgagcacgt 


gagtatacgt 


gattaagcac 


acaaaggcag 


9120 


cttggagtat 


gtctgttatt 


aatttcadag 


gtagttctgg 


tccattggtg 


aaagtttgcg 


9180 


gcttgcagag 


cacagaggcc 


gcagaatgtg 


ctctagattc 


cgatgctgac 


ttgctgggta 


9240 


ttatatgtgt 


gcccaataga 


aagagaacaa 


ttgacccggt 


tattgcaagg 


aaaatttcaa 


9300 


gtcttgtaaa 


agcatataaa 


aatagttcag 


gcactccgaa 


atacttggtt 


ggcgtgtttc 


9360 


gtaatcaacc 


taaggaggat 


gttttggctc 


tggtcaatga 


ttacggcatt 


gatatcgtcc 


9420 


aactgcatgg 


agatgagtcg 


tggcaagaat 


accaagagtt 


cctcggtttg 


ccagtibatita 


9480 


aaagactcgt 


atltitccaaaa 


gactgcaaca 


tactactcag 


tgcagcttca 


cagaaacctc 


9540 


attcgtttat 


tcccttgttt 


gattcagaag 


caggtgggac 


aggtgaactt 


ttggattgga 


9600 


actcgatttc 


tgactgggtt 


ggaaggcaag 


agagccccga 


aagcttacat 


tttatgttag 


9660 


ctggtggact 


gacgccagaa 


aatgttggtg 


atgcgcttag 


attaaatggc 


gttattggtg 


9720 


ttgatgtaag 


cggaggtgtg 


gagacaaatg 


gtigtaaaaga 


ctctaacaaa 


atiagcaaatl: 


9780 


tcgtcaaaaa 


tgctaagaaa 


taggttatta 


ctgagtagta 


tttatttaag 


tattgtttgt 


9840 


gcacttgcct 


gcaggccttt 


tgaaaagcaa 


gcataaaaga 


tictaaacata 


aaatictgtiaa 


9900 


aataacaaga 


tgtaaagata 


atgptaaatc 


atttggcttt 


ttgattgatt 


gtacaggaaa 


9960 


atatacatcg 


cagggggttg 


acttttacca 


tttcaccgca 


atggaatcaa 


acttgfctgaa 


10020 


gagaatgttc 


acaggcgcat 


acgctacaat 


gacccgattc 


ttgctagcct 


tttctcggtc 


10080 


ttgcaaacaa 


ccgccggcag 


cttagtatat 


aaatacacat 


gtacatacct 


ctctccgtat 


10140 


cctcgtaatc 


attttcttgt 


atttatcgtc 


ttttcgctgt 


aaaaacttta 


tcacacttat 


10200 


ctcaaataca 


cttattaacc 


gcttttacta 


ttatcttcta 


cgctgacagt 


aatatcaaac 


10260 


agtgacacat 


attaaacaca 


gtggtttctt 


tgcataaaca 


ccatcagcct 


caagtcgtca 


10320 


agtaaagatt 


tcgtgttcat 


gcagatagat 


aacaatctat 


atgttgataa 


ttagcgttgc 


10380 


ctcatcaatg 


cgagatccgt 


ttaaccggac' 


cctagtgcac 


ttaccccacg 


ttcggtccac 


10440 


tgtgtgccga 


acatgctcct 


tcactatttt 


aacatgtgga 


attaattcta 


aatcctcttt 


10500 
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atatgatctg 


ccgataga'ta 


gttctaagtc 


attgaggttc 


atcaacaatt 


ggattttctg 


10560 


tctactcgac 


t'tcaggtaaa 


tgaaatgaga 


tgatacttgc 


ttatctcata 


gttaactcta 


10620 


• agaggtgata 


cttatttact 


gtaaaactgt 


gacgataaaa 


ccggaaggaa 


gaataagaaa 


10680 


actcgaactg 


atctataatg 


cctattttct 


gtaaagagtt 


taagctatga 


aagcctcggc 


10740 


attttggccg 


ctcctaggta 


gtgctttttt 


tccaaggaca 


aaacagtttc 


tttttcttga 


10800 


gcaggtttta 


tgtttcggta 


atcataaaca 


ataaataaat 


tatttcattt 


atgtttaaaa 


10860 


ataaaaaata 


aaaaagtat:t 


ttaaattttt 


aaaaaagttg 


attataagca 


tgtgaccttt 


10920 


tgcaagcaat 


taaattttgc 


aatttgtgat 


tttaggcaaa 


agttacaa1:t 


tctggctcgt 


10980 


gtaatatatg 


t:atgctaaag 


tgaactttta 


caaagtcgat 


atggacttag 


tcaaaagaaa 


11040 


ttttcttaaa 


aatatatagc 


actagccaat 


ttagcacttc 


tttatgagat 


atattataga 


11100 


ctttattaag 


ccagatttgt gtattatatg tatttacccg gcgaatcatg gacatacatt 


11160 


ctgaaatagg 


taatattctc 


tatggtgaga 


cagcatagat 


aacctaggat 


acaagttaaa 


11220 


agctagtact 


gttttgcagt 


aatttttttc 


ttttttataa 


gaatgttacc 


acctaaataa 


11280 


gttataaagt 


caatagttaa 


gtttgatatt 


tgattgtaiaa 


ataccgtaat 


atatttgcat 


11340 


gatcaaaagg 


ctcaatgttg 


actagccagc 


atgtcaacca 


ctatattgat 


caccgatata 


11400 


tggacttcca 


caccaactag taatatgaca 


ataaattcaa 


gatattcttc atgagaatgg 


11460 


cccaga 
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